mortada / fredapi

Python API for FRED (Federal Reserve Economic Data) and ALFRED (Archival FRED)
Apache License 2.0
902 stars 159 forks source link

Update #52

Open dsgerbc opened 2 years ago

dsgerbc commented 2 years ago

The FRED API seems to have implemented a hard cap of 2000 vintages to be retrieved at once. It comes into play if one tries to run get_series_all_releases on any daily data series present in ALFRED (for example 'T1YFF'). Requests for data come back with a "Bad Request. This exceeds the maximum number of vintage dates allowed (2000)" error. The fix splits data retrieval into batches of <= 2000 vintages.

tommyjcarpenter commented 1 year ago

hi - I have PRd an additional commit into dgerbc branch. Is it possible for @dsgerbc to merge that, and then we merge this PR? as of now I forked this and pushed to our private PYPI but i would prefer the upstream to have it.

TomasKoutek commented 1 year ago

Hi guys, i have a similar issue from @almostintuitive in my pystlouisfed library. I have done various tests and I think this problem probably has no solution. Unfortunately, even this implementation is not correct.

The resulting "realtime_start" and "realtime_end" fields change depending on the "realtime_start" and "realtime_end" parameters.

For example: For testing, we change "max_chunk_length = 2000" to "max_chunk_length = 90" and test:

from fredapi import Fred

fred = Fred(api_key='xyz')
df = fred.get_series_all_releases(series_id='GNPCA', realtime_start='1776-09-24', realtime_end='2001-03-27')

df[df['date'] == '1929-01-01']
idx realtime_start date value
0 1958-12-21 1929-01-01 181.8
1 1965-08-19 1929-01-01 203.6
2 1976-01-16 1929-01-01 314.7
3 1980-12-23 1929-01-01 315.7
4 1985-12-20 1929-01-01 709.6
0 1987-02-19 1929-01-01 709.6
1 1991-12-04 1929-01-01 NaT
2 1992-12-22 1929-01-01 827.4
3 1996-01-19 1929-01-01 NaT
4 1997-04-30 1929-01-01 796.8
5 1999-10-29 1929-01-01 NaT
6 2000-04-27 1929-01-01 828.9

The value "709.6" is duplicated in the resulting DataFrame, because it falls into two ranges:

1958-12-21 -> 1987-01-22 1987-02-19 -> 2000-07-28

<observation realtime_start="1958-12-21" realtime_end="1965-08-18" date="1929-01-01" value="181.8"/>
<observation realtime_start="1965-08-19" realtime_end="1976-01-15" date="1929-01-01" value="203.6"/>
<observation realtime_start="1976-01-16" realtime_end="1980-12-22" date="1929-01-01" value="314.7"/>
<observation realtime_start="1980-12-23" realtime_end="1985-12-19" date="1929-01-01" value="315.7"/>
<observation realtime_start="1985-12-20" realtime_end="1991-12-03" date="1929-01-01" value="709.6"/>
<observation realtime_start="1991-12-04" realtime_end="1992-12-21" date="1929-01-01" value="."/>
<observation realtime_start="1992-12-22" realtime_end="1996-01-18" date="1929-01-01" value="827.4"/>
<observation realtime_start="1996-01-19" realtime_end="1997-04-29" date="1929-01-01" value="."/>
<observation realtime_start="1997-04-30" realtime_end="1999-10-28" date="1929-01-01" value="796.8"/>
<observation realtime_start="1999-10-29" realtime_end="2000-04-26" date="1929-01-01" value="."/>
<observation realtime_start="2000-04-27" realtime_end="2001-03-27" date="1929-01-01" value="828.9"/>

Even if it would be possible to calculate realtime_end (shift Series) from realtime_start, it is not possible to remove realtime_start duplicates with certainty on the client side. That's why I think it's not possible to apply pagination to vintages data.

FRED's proposal is wrong, when the vintages limit is exceeded, pagination should be enforced directly by FRED (limit/offset). In addition, this is an undocumented functionality. But FRED has more such problems...


almostintuitive commented 1 year ago

@TomasKoutek : thanks for the investigation! I'll also look into it soon. We, specifically, are only interested in the original release date, which - if I understand it correctly - can be still extracted for these series. Do you know if this statement is correct? Thanks! Mark