Open topper-123 opened 6 years ago
Other common searches (e.g. CPI) also fail for the same reason. Seems the problem is that FRED now carries some data from the Bank of England that show up in the searches and go back further than the pandas Timestamp limits.
Moreover there is a problem if you try to access one of these series with older data. For example
In [83]: fred.get_series('HPGDPUKA')
ValueError: time data '1600-01-01' doesn't match format specified
Perhaps a more stable solution (at the expense of some compat) would be to make observation_start and observation_end everywhere period-like instead of datetimelike. I believe these fields should always be whole days, which can be represented as Period(freq='D')
.
I think using PeriodIndex, but all having freq=‘d’
could get confusing. Does the JSON contain frequncy data? Then PeriodIndex could be used, but with correct freq, which would actually be very cool.
PeriodIndex does not have the limitations that DateTimeIndex has, so that would solve this problem with very old dates also.
Yeah, I was trying to think of something where the column would always have the same type; I chose freq='D' just because observation_start and observation_end appear to always represent dates and so day accuracy would work. I agree using Periods might imply a regularity that's confusing.
Numpy datetime64s to an accuracy of day would probably make the most sense for the use case, but appear to confuse pandas when put in a pandas data structure:
In [49]: x = np.datetime64('1200-01-01')
In [50]: x
Out[50]: numpy.datetime64('1200-01-01')
In [51]: x.dtype
Out[51]: dtype('<M8[D]')
In [52]: np.array([x])
Out[52]: array(['1200-01-01'], dtype='datetime64[D]')
In [53]: pd.Series(np.array([x]))
Out[53]:
0 1784-07-20 23:34:33.709551616
dtype: datetime64[ns]
Hey,
Great library! We are using it at https://github.com/GamestonkTerminal/GamestonkTerminal
I was trying to implement the search method, and it crashes like this.
>> fred.search('gdp')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2058 try:
-> 2059 values, tz_parsed = conversion.datetime_to_datetime64(data)
2060 # If tzaware, these values represent unix timestamps, so we
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
OutOfBoundsDatetime Traceback (most recent call last)
<ipython-input-41-3ca6696fefcb> in <module>
1 num = 5
2
----> 3 df_fred = fred.search('gdp')
~/anaconda3/lib/python3.6/site-packages/fredapi/fred.py in search(self, text, limit, order_by, sort_order, filter)
376 url = "%s/series/search?search_text=%s&" % (self.root_url,
377 quote_plus(text))
--> 378 info = self.__get_search_results(url, limit, order_by, sort_order, filter)
379 return info
380
~/anaconda3/lib/python3.6/site-packages/fredapi/fred.py in __get_search_results(self, url, limit, order_by, sort_order, filter)
333 raise ValueError('%s is not in the valid list of sort_order options: %s' % (sort_order, str(sort_order_options)))
334
--> 335 data, num_results_total = self.__do_series_search(url)
336 if data is None:
337 return data
~/anaconda3/lib/python3.6/site-packages/fredapi/fred.py in __do_series_search(self, url)
298 # parse datetime columns
299 for field in ["realtime_start", "realtime_end", "observation_start", "observation_end", "last_updated"]:
--> 300 data[field] = data[field].apply(self._parse, format=None)
301 # set index name
302 data.index.name = 'series id'
~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
4211 else:
4212 values = self.astype(object)._values
-> 4213 mapped = lib.map_infer(values, f, convert=convert_dtype)
4214
4215 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in f(x)
4196
4197 def f(x):
-> 4198 return func(x, *args, **kwds)
4199
4200 else:
~/anaconda3/lib/python3.6/site-packages/fredapi/fred.py in _parse(self, date_str, format)
73 helper function for parsing FRED date string into datetime
74 """
---> 75 rv = pd.to_datetime(date_str, format=format)
76 if hasattr(rv, 'to_pydatetime'):
77 rv = rv.to_pydatetime()
~/anaconda3/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
828 result = convert_listlike(arg, format)
829 else:
--> 830 result = convert_listlike(np.array([arg]), format)[0]
831
832 return result
~/anaconda3/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
464 errors=errors,
465 require_iso8601=require_iso8601,
--> 466 allow_object=True,
467 )
468
~/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2062 return values.view("i8"), tz_parsed
2063 except (ValueError, TypeError):
-> 2064 raise e
2065
2066 if tz_parsed is not None:
~/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2053 dayfirst=dayfirst,
2054 yearfirst=yearfirst,
-> 2055 require_iso8601=require_iso8601,
2056 )
2057 except ValueError as e:
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1270-01-01 00:00:00
Having dates outside of the limitations of pd.TimeStamp causes the problem. In the above search, we get back a time series (HPGDPUKA) that goes back to 1270, so fails to be searched for.
A solution proposal
In
fredapi\fred.py
there is a linerv = pd.to_datetime(date_str, format=format)
, where the bug happens. If this line is replaced withrv = pd.to_datetime(date_str, errors='ignore', format=format)
, this would solve the issue (at the cost of having a object index in such cases, instead of DateTimeIndex).