Closed justinlent closed 9 years ago
@justinlent Seems like we can also get that via pandas data reader: http://pandas.pydata.org/pandas-docs/stable/remote_data.html#remote-data-ff Do you think that's sufficient?
It looks like the fama-french data from pandas is weekly/monthly frequency. @justinlent where did you download the daily factors currently in quantrisk from?
EDIT: It looks like we can get the daily daily data from the Dartmouth site using the pandas DataReader class. The data available there has not been updated since May 29, so the outdated issues are probably because the database is not updated very often, monthly updates is my guess.
Downloading the factors directly sounds like a good idea, the files download pretty quickly. Do we want to expose the webreader functions or wrap them ourselves and keep local versions cached? Using pandas makes it easy to have access to all of the F-F data sets available on the edu site.
@twiecki @humdings the daily FF are at this URL: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip
which is linked to from this data page on Ken French's academic research page - scroll down to section "U.S. Research Returns Data (Downloadable Files)", then a couple links down, the zip is at the link named: Fama/French 3 Factors [Daily]
There's actually TONS of FF data here that may become useful down the road
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
Pandas is awesome once again! These few lines load the daily fama-french data and converts the index to actual timestamps.
import pandas as pd
import pandas.io.data as web
factors = web.DataReader("F-F_Research_Data_Factors_daily", "famafrench")[0]
factors.index = pd.to_datetime(factors.index, format="%Y%m%d", utc=True)
great @humdings ! we can hold off on implementing this for the initial public release i think. i'd want to do some testing first to make sure everything works correctly since we weren't pulling directly in the past. e.g. we should wrap the data pull with error checking and failover to the pre-existing csv file, etc
Another option is Quandl https://www.quandl.com/data/KFRENCH/FACTORS_D-Fama-French-Factors-Daily
Dartmouth page is down right now. Quandl seems more reliable, but we could have a double fallback i.e. dartmouth -> quandl -> local. I'll try to do this today.
Dartmouth seems to be up at your second link here: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
I think the pandas function and old URL are broken, but we can unzip files with python so shouldn't be a problem.
I can't seem to find UMD anywhere. @justin @humdings anyone know where that's from? Also maybe consider replacing UMD with Rf, but I'm assuming we have a good reason for wanting UMD.
From justin, UMD = momentum factors -> http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_daily_CSV.zip
We should look into directly pulling this data since the csv can get out of date really quickly. Maybe we supply a default csv (if we can) as well as expose a function to pull the data from the university page
We also might want to confirm if there are any licenses or issues with redistributing it. @twiecki maybe you have some experience understanding redistribution of data in the OSS world?