mortada / fredapi

Python API for FRED (Federal Reserve Economic Data) and ALFRED (Archival FRED)
Apache License 2.0
930 stars 160 forks source link

BUG: fred.search('Real GDP') raises an OutOfBoundsDatetime #21

Open topper-123 opened 6 years ago

topper-123 commented 6 years ago
>>> from fredapi import Fred
>>> fred = Fred(api_key_file='api_key_file.txt')
>>> fred.search('Real GDP')
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1270-01-01 00:00:00

Having dates outside of the limitations of pd.TimeStamp causes the problem. In the above search, we get back a time series (HPGDPUKA) that goes back to 1270, so fails to be searched for.

A solution proposal

In fredapi\fred.py there is a line rv = pd.to_datetime(date_str, format=format), where the bug happens. If this line is replaced with rv = pd.to_datetime(date_str, errors='ignore', format=format), this would solve the issue (at the cost of having a object index in such cases, instead of DateTimeIndex).

Liam3851 commented 6 years ago

Other common searches (e.g. CPI) also fail for the same reason. Seems the problem is that FRED now carries some data from the Bank of England that show up in the searches and go back further than the pandas Timestamp limits.

Moreover there is a problem if you try to access one of these series with older data. For example

In [83]: fred.get_series('HPGDPUKA')
ValueError: time data '1600-01-01' doesn't match format specified

Perhaps a more stable solution (at the expense of some compat) would be to make observation_start and observation_end everywhere period-like instead of datetimelike. I believe these fields should always be whole days, which can be represented as Period(freq='D').

topper-123 commented 6 years ago

I think using PeriodIndex, but all having freq=‘d’ could get confusing. Does the JSON contain frequncy data? Then PeriodIndex could be used, but with correct freq, which would actually be very cool.

PeriodIndex does not have the limitations that DateTimeIndex has, so that would solve this problem with very old dates also.

Liam3851 commented 6 years ago

Yeah, I was trying to think of something where the column would always have the same type; I chose freq='D' just because observation_start and observation_end appear to always represent dates and so day accuracy would work. I agree using Periods might imply a regularity that's confusing.

Numpy datetime64s to an accuracy of day would probably make the most sense for the use case, but appear to confuse pandas when put in a pandas data structure:

In [49]: x = np.datetime64('1200-01-01')

In [50]: x
Out[50]: numpy.datetime64('1200-01-01')

In [51]: x.dtype
Out[51]: dtype('<M8[D]')

In [52]: np.array([x])
Out[52]: array(['1200-01-01'], dtype='datetime64[D]')

In [53]: pd.Series(np.array([x]))
Out[53]:
0   1784-07-20 23:34:33.709551616
dtype: datetime64[ns]
DidierRLopes commented 3 years ago

Hey,

Great library! We are using it at https://github.com/GamestonkTerminal/GamestonkTerminal

I was trying to implement the search method, and it crashes like this.

>> fred.search('gdp')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2058         try:
-> 2059             values, tz_parsed = conversion.datetime_to_datetime64(data)
   2060             # If tzaware, these values represent unix timestamps, so we

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-41-3ca6696fefcb> in <module>
      1 num = 5
      2 
----> 3 df_fred = fred.search('gdp')

~/anaconda3/lib/python3.6/site-packages/fredapi/fred.py in search(self, text, limit, order_by, sort_order, filter)
    376         url = "%s/series/search?search_text=%s&" % (self.root_url,
    377                                                     quote_plus(text))
--> 378         info = self.__get_search_results(url, limit, order_by, sort_order, filter)
    379         return info
    380 

~/anaconda3/lib/python3.6/site-packages/fredapi/fred.py in __get_search_results(self, url, limit, order_by, sort_order, filter)
    333                 raise ValueError('%s is not in the valid list of sort_order options: %s' % (sort_order, str(sort_order_options)))
    334 
--> 335         data, num_results_total = self.__do_series_search(url)
    336         if data is None:
    337             return data

~/anaconda3/lib/python3.6/site-packages/fredapi/fred.py in __do_series_search(self, url)
    298             # parse datetime columns
    299             for field in ["realtime_start", "realtime_end", "observation_start", "observation_end", "last_updated"]:
--> 300                 data[field] = data[field].apply(self._parse, format=None)
    301             # set index name
    302             data.index.name = 'series id'

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4211             else:
   4212                 values = self.astype(object)._values
-> 4213                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4214 
   4215         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in f(x)
   4196 
   4197             def f(x):
-> 4198                 return func(x, *args, **kwds)
   4199 
   4200         else:

~/anaconda3/lib/python3.6/site-packages/fredapi/fred.py in _parse(self, date_str, format)
     73         helper function for parsing FRED date string into datetime
     74         """
---> 75         rv = pd.to_datetime(date_str, format=format)
     76         if hasattr(rv, 'to_pydatetime'):
     77             rv = rv.to_pydatetime()

~/anaconda3/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    828             result = convert_listlike(arg, format)
    829     else:
--> 830         result = convert_listlike(np.array([arg]), format)[0]
    831 
    832     return result

~/anaconda3/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    464             errors=errors,
    465             require_iso8601=require_iso8601,
--> 466             allow_object=True,
    467         )
    468 

~/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2062             return values.view("i8"), tz_parsed
   2063         except (ValueError, TypeError):
-> 2064             raise e
   2065 
   2066     if tz_parsed is not None:

~/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2053             dayfirst=dayfirst,
   2054             yearfirst=yearfirst,
-> 2055             require_iso8601=require_iso8601,
   2056         )
   2057     except ValueError as e:

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1270-01-01 00:00:00