pydata / pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.
https://pydata.github.io/pandas-datareader/stable/index.html
Other
2.92k stars 682 forks source link

BLS (Bureau of Labor Statistics) #460

Closed westurner closed 5 years ago

westurner commented 6 years ago

https://www.bls.gov/developers/

https://www.bls.gov/developers/api_python.htm#python2

https://github.com/zewilson/bls/blob/master/data_loader.py

bashtage commented 6 years ago

Is there data here that isn't in FRED?

westurner commented 6 years ago

That's a good question. IDK if FRED has every BLS series or not?

https://fred.stlouisfed.org/sources

esvhd commented 6 years ago

On another note, does the BLS API provide first release data? i.e. without following revision?

dtemkin commented 6 years ago

I began working on this and I hit a "design" snag. Since, all of the series' are encoded, what is the best way to simulate the 'multi-screen search' feature? Or should this be avoided all together, thereby requiring pre-compiled series ids to be inputted by the user?

addisonlynch commented 6 years ago

@dtemkin it's pretty much impossible to parameterize the series ID's since there are hundreds of series (many have different parameters) and thousands of possible permutations.

Another issue is going to be how to handle the formatting of the dataframe as many series return different formats - including JSON structure differences. I too started working on this about a month ago (as a standalone library which is pretty much written in the PDR style).

v1 of the API requires no authentication v2 of the API does

I've included both. You just pass the series ID as a string, but I never got around to fixing the formatting.

dtemkin commented 6 years ago

@addisonlynch The way I dealt with it in my initial attempt was a lookup function that used a local list of BLS dbs and there acronyms to find the most likely candidates, then the user would enter the selection in the console. But I am actually thinking that explicitly providing endpoints for popular series like the CPI or PPI, might be viable.

bashtage commented 5 years ago

Sounds like this is not a good fit for pdr. If someone wants to submit a reader, please open a new issue.