Update fred.py - Githubissues

dsgerbc commented 2 years ago

The FRED API seems to have implemented a hard cap of 2000 vintages to be retrieved at once. It comes into play if one tries to run get_series_all_releases on any daily data series present in ALFRED (for example 'T1YFF'). Requests for data come back with a "Bad Request. This exceeds the maximum number of vintage dates allowed (2000)" error. The fix splits data retrieval into batches of <= 2000 vintages.

tommyjcarpenter commented 1 year ago

hi - I have PRd an additional commit into dgerbc branch. Is it possible for @dsgerbc to merge that, and then we merge this PR? as of now I forked this and pushed to our private PYPI but i would prefer the upstream to have it.

TomasKoutek commented 1 year ago

Hi guys, i have a similar issue from @almostintuitive in my pystlouisfed library. I have done various tests and I think this problem probably has no solution. Unfortunately, even this implementation is not correct.

The resulting "realtime_start" and "realtime_end" fields change depending on the "realtime_start" and "realtime_end" parameters.

For example: For testing, we change "max_chunk_length = 2000" to "max_chunk_length = 90" and test:

from fredapi import Fred

fred = Fred(api_key='xyz')
df = fred.get_series_all_releases(series_id='GNPCA', realtime_start='1776-09-24', realtime_end='2001-03-27')

df[df['date'] == '1929-01-01']

idx	realtime_start	date	value
0	1958-12-21	1929-01-01	181.8
1	1965-08-19	1929-01-01	203.6
2	1976-01-16	1929-01-01	314.7
3	1980-12-23	1929-01-01	315.7
4	1985-12-20	1929-01-01	709.6
0	1987-02-19	1929-01-01	709.6
1	1991-12-04	1929-01-01	NaT
2	1992-12-22	1929-01-01	827.4
3	1996-01-19	1929-01-01	NaT
4	1997-04-30	1929-01-01	796.8
5	1999-10-29	1929-01-01	NaT
6	2000-04-27	1929-01-01	828.9

The value "709.6" is duplicated in the resulting DataFrame, because it falls into two ranges:

1958-12-21 -> 1987-01-22 1987-02-19 -> 2000-07-28

https://api.stlouisfed.org/fred/series/observations?series_id=GNPCA&api_key=xyz&realtime_start=1776-09-24&realtime_end=2001-03-27

<observation realtime_start="1958-12-21" realtime_end="1965-08-18" date="1929-01-01" value="181.8"/>
<observation realtime_start="1965-08-19" realtime_end="1976-01-15" date="1929-01-01" value="203.6"/>
<observation realtime_start="1976-01-16" realtime_end="1980-12-22" date="1929-01-01" value="314.7"/>
<observation realtime_start="1980-12-23" realtime_end="1985-12-19" date="1929-01-01" value="315.7"/>
<observation realtime_start="1985-12-20" realtime_end="1991-12-03" date="1929-01-01" value="709.6"/>
<observation realtime_start="1991-12-04" realtime_end="1992-12-21" date="1929-01-01" value="."/>
<observation realtime_start="1992-12-22" realtime_end="1996-01-18" date="1929-01-01" value="827.4"/>
<observation realtime_start="1996-01-19" realtime_end="1997-04-29" date="1929-01-01" value="."/>
<observation realtime_start="1997-04-30" realtime_end="1999-10-28" date="1929-01-01" value="796.8"/>
<observation realtime_start="1999-10-29" realtime_end="2000-04-26" date="1929-01-01" value="."/>
<observation realtime_start="2000-04-27" realtime_end="2001-03-27" date="1929-01-01" value="828.9"/>

Even if it would be possible to calculate realtime_end (shift Series) from realtime_start, it is not possible to remove realtime_start duplicates with certainty on the client side. That's why I think it's not possible to apply pagination to vintages data.

FRED's proposal is wrong, when the vintages limit is exceeded, pagination should be enforced directly by FRED (limit/offset). In addition, this is an undocumented functionality. But FRED has more such problems...

Tomas

almostintuitive commented 1 year ago

@TomasKoutek : thanks for the investigation! I'll also look into it soon. We, specifically, are only interested in the original release date, which - if I understand it correctly - can be still extracted for these series. Do you know if this statement is correct? Thanks! Mark

mortada / fredapi

Update fred.py #52