Open paeder92 opened 6 years ago
you can use an open ended date range:
DateRange(dt(2015, 1, 1), None)
Also, the issue could be that the data you have written it not datetime indexed. The dataframe would need to have a column in the index called "date" that is a DateTimeIndex
I checked, and the data is datetime indexed and the name of the index is "date". Is there any way to avoid using open ends? I would like to keep my approach as generic as possible, which is why having an over-long DateRange is more convenient.
I'd have to have a sample of the data, or some other example that reproduces the issue to know what is going on. It works for me:
lib.read('demo', date_range=DateRange(dt(2015,1,1), dt(2020,1,1))).data
data
date
2016-01-01 1
2016-01-02 2
the only other thing that comes to mind is an issue with pandas - do you know what version of pandas you have installed?
I am using pandas 0.23.0
Updated to 0.23.4 now, still the same issue
I was able to narrow it down: the problem comes up when inserting the data using append, and only if it is used twice. I tried the following data:
value
date
2015-11-17 13:32:38.636 1
2017-12-07 01:34:54.500 2
and then ran the following:
arc.append("test", df, upsert=True)
t = DateRange(datetime(2010,1,1), datetime(2020,1,1))
arc.read("test", date_range=t).data
This works. However, if I again run:
arc.append("test", df, upsert=True)
Then it is no longer possible to retrieve the data using
arc.read("test", date_range=t).data
and an error as described above appears.
The issue seems to originate in _daterange of _pandas_ndarray_store.py. I disabled timezones in to_pandas_closed_closed
of date._util (although that probably does not have anything to do with the issue) and changed _daterange
to the following:
def _daterange(self, recarr, date_range):
""" Given a recarr, slice out the given artic.date.DateRange if a
datetime64 index exists """
idx = self._datetime64_index(recarr)
if idx and len(recarr):
dts = recarr[idx]
mask = Series(np.zeros(len(dts)), index=dts)
start, end = _start_end(date_range, dts)
if start < np.datetime64(min(dts)):
start = np.datetime64(min(dts))
if end > np.datetime64(max(dts)):
end = np.datetime64(max(dts))
mask[start:end] = 1.0
return recarr[mask.values.astype(bool)]
return recarr
Now it is working. It seems that sometimes, mask[start:end]
struggles with values that are outside of the range of dts
and sometimes it does not?
i'm guessing its because the datetimeindex is no longer sorted when you append twice.
Arctic Version
Arctic Store
Platform and version
Windows 10, Anaconda, Python 3.5 environment
Description of problem and/or code sample that reproduces the issue
Hello,
I stored several time series in MongoDB and use date_range to retrieve parts of them, or, as in the case here, to retrieve them in full. For example I use:
t = DateRange( datetime(2015, 1, 1), datetime(2020, 1, 1) )
df = mongo.read("timeseries", date_range=t).data
This should give me the entire time series, since it only holds data from 2016 to 2018 and therefore falls into
t
entirely, correct? It results in a long series of errors involving pandas and arctic: https://pastebin.com/KVzK0RYXThe error must be related to the data of my timeseries not actually extending to 2020, but only until 2018. If I set the end of the DateRange to 2018, I do not get this error.
I know that I could use
t = DateRange()
to cover the entire data, but due to compatibility with my remaining code (it is difficult to predict in which case the entire range is needed and when only a piece of it is actually needed), I would prefer not to solve it this way.Thanks and best regards