pydata / pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.
https://pydata.github.io/pandas-datareader/stable/index.html
Other
2.95k stars 681 forks source link

TSP download no longer working? #824

Open droodman opened 4 years ago

droodman commented 4 years ago

It looks like TSP's recent overhaul of its web page for publishing price data has broken pandas_datareader.tsp. Here's a log:

`>>> import pandas_datareader.tsp as tsp

tspreader = tsp.TSPReader(start='2015-10-1', end='2015-12-31') tspreader.read() Empty DataFrame Columns: [] Index: [, Click here if you are not redirected.,

Redirecting…

, , , , , Redirecting…, , ]`

RodneyWashburn commented 3 years ago

I just encountered the same issue. Looking to see if anyone has found a workaround.

TW652 commented 3 years ago

Reviving this issue - anyone find a workaround?

RodneyWashburn commented 3 years ago

I had to go in and download the daily data manually. The only challenge here is that it only goes back to the early 2000s.

AnonJohn commented 1 year ago

I went through the new website and debugger to get the magic string: https://www.tsp.gov/data/fund-price-history.csv?startdate=2023-10-05&enddate=2023-11-04&Lfunds=1&InvFunds=1&download=1

This should work easily with pandas read_csv and be a basis for updating the TSP download.

Update: Not as easy as it should be as TSP has some cookie-based checking. You get a 403 forbidden if you access from Requests (or wget) w/o cookies. Should be possible to setup the cookies for the requests library.

Not sure pandas-datareader is still in development though.

AnonJohn commented 1 year ago

I went through the new website and debugger to get the magic string: https://www.tsp.gov/data/fund-price-history.csv?startdate=2023-10-05&enddate=2023-11-04&Lfunds=1&InvFunds=1&download=1

This should work easily with pandas read_csv and be a basis for updating the TSP download.

Update: Not as easy as it should be as TSP has some cookie-based checking. You get a 403 forbidden if you access from Requests (or wget) w/o cookies. Should be possible to setup the cookies for the requests library.

Not sure pandas-datareader is still in development though.

This approach works: import pandas as pd from urllib.request import Request, urlopen # Python 3

req = Request(url) req.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0') content = urlopen(req)

df = pd.read_csv(content) print(df)

bashtage commented 1 year ago

It is, but needs fixes like this to be made into PRs. Would like to get the readers back in shape then have a release.

AnonJohn commented 1 year ago

It is, but needs fixes like this to be made into PRs. Would like to get the readers back in shape then have a release.

Cool, thanks for the reply. Let me see what I can do. Some oddities remain (using Requests as above downloads the entire TSP history (since 2003), but in a browser it only downloads the dates in the request url (as expected). Hmmm

jat255 commented 6 months ago

Cool, thanks for the reply. Let me see what I can do. Some oddities remain (using Requests as above downloads the entire TSP history (since 2003), but in a browser it only downloads the dates in the request url (as expected). Hmmm

Using the download button in the browser also downloads the whole dataset, but then uses some client-side javascript to parse it down to the dates provided (a little silly/wasteful, but I suppose it works). It happens in the doSharePriceDownload() function of https://www.tsp.gov/assets/js/ajaxFetch.js