Open dsgerbc opened 2 years ago
hi - I have PRd an additional commit into dgerbc branch. Is it possible for @dsgerbc to merge that, and then we merge this PR? as of now I forked this and pushed to our private PYPI but i would prefer the upstream to have it.
Hi guys, i have a similar issue from @almostintuitive in my pystlouisfed library. I have done various tests and I think this problem probably has no solution. Unfortunately, even this implementation is not correct.
The resulting "realtime_start" and "realtime_end" fields change depending on the "realtime_start" and "realtime_end" parameters.
For example: For testing, we change "max_chunk_length = 2000" to "max_chunk_length = 90" and test:
from fredapi import Fred
fred = Fred(api_key='xyz')
df = fred.get_series_all_releases(series_id='GNPCA', realtime_start='1776-09-24', realtime_end='2001-03-27')
df[df['date'] == '1929-01-01']
idx | realtime_start | date | value |
---|---|---|---|
0 | 1958-12-21 | 1929-01-01 | 181.8 |
1 | 1965-08-19 | 1929-01-01 | 203.6 |
2 | 1976-01-16 | 1929-01-01 | 314.7 |
3 | 1980-12-23 | 1929-01-01 | 315.7 |
4 | 1985-12-20 | 1929-01-01 | 709.6 |
0 | 1987-02-19 | 1929-01-01 | 709.6 |
1 | 1991-12-04 | 1929-01-01 | NaT |
2 | 1992-12-22 | 1929-01-01 | 827.4 |
3 | 1996-01-19 | 1929-01-01 | NaT |
4 | 1997-04-30 | 1929-01-01 | 796.8 |
5 | 1999-10-29 | 1929-01-01 | NaT |
6 | 2000-04-27 | 1929-01-01 | 828.9 |
The value "709.6" is duplicated in the resulting DataFrame, because it falls into two ranges:
1958-12-21 -> 1987-01-22 1987-02-19 -> 2000-07-28
<observation realtime_start="1958-12-21" realtime_end="1965-08-18" date="1929-01-01" value="181.8"/>
<observation realtime_start="1965-08-19" realtime_end="1976-01-15" date="1929-01-01" value="203.6"/>
<observation realtime_start="1976-01-16" realtime_end="1980-12-22" date="1929-01-01" value="314.7"/>
<observation realtime_start="1980-12-23" realtime_end="1985-12-19" date="1929-01-01" value="315.7"/>
<observation realtime_start="1985-12-20" realtime_end="1991-12-03" date="1929-01-01" value="709.6"/>
<observation realtime_start="1991-12-04" realtime_end="1992-12-21" date="1929-01-01" value="."/>
<observation realtime_start="1992-12-22" realtime_end="1996-01-18" date="1929-01-01" value="827.4"/>
<observation realtime_start="1996-01-19" realtime_end="1997-04-29" date="1929-01-01" value="."/>
<observation realtime_start="1997-04-30" realtime_end="1999-10-28" date="1929-01-01" value="796.8"/>
<observation realtime_start="1999-10-29" realtime_end="2000-04-26" date="1929-01-01" value="."/>
<observation realtime_start="2000-04-27" realtime_end="2001-03-27" date="1929-01-01" value="828.9"/>
Even if it would be possible to calculate realtime_end (shift Series) from realtime_start, it is not possible to remove realtime_start duplicates with certainty on the client side. That's why I think it's not possible to apply pagination to vintages data.
FRED's proposal is wrong, when the vintages limit is exceeded, pagination should be enforced directly by FRED (limit/offset). In addition, this is an undocumented functionality. But FRED has more such problems...
Tomas
@TomasKoutek : thanks for the investigation! I'll also look into it soon. We, specifically, are only interested in the original release date, which - if I understand it correctly - can be still extracted for these series. Do you know if this statement is correct? Thanks! Mark
The FRED API seems to have implemented a hard cap of 2000 vintages to be retrieved at once. It comes into play if one tries to run get_series_all_releases on any daily data series present in ALFRED (for example 'T1YFF'). Requests for data come back with a "Bad Request. This exceeds the maximum number of vintage dates allowed (2000)" error. The fix splits data retrieval into batches of <= 2000 vintages.