quantopian / pyfolio

Portfolio and risk analytics in Python
https://quantopian.github.io/pyfolio
Apache License 2.0
5.7k stars 1.78k forks source link

pull Fama-French risk factors directly from the academic website #44

Closed justinlent closed 9 years ago

justinlent commented 9 years ago

We should look into directly pulling this data since the csv can get out of date really quickly. Maybe we supply a default csv (if we can) as well as expose a function to pull the data from the university page

We also might want to confirm if there are any licenses or issues with redistributing it. @twiecki maybe you have some experience understanding redistribution of data in the OSS world?

twiecki commented 9 years ago

@justinlent Seems like we can also get that via pandas data reader: http://pandas.pydata.org/pandas-docs/stable/remote_data.html#remote-data-ff Do you think that's sufficient?

humdings commented 9 years ago

It looks like the fama-french data from pandas is weekly/monthly frequency. @justinlent where did you download the daily factors currently in quantrisk from?

EDIT: It looks like we can get the daily daily data from the Dartmouth site using the pandas DataReader class. The data available there has not been updated since May 29, so the outdated issues are probably because the database is not updated very often, monthly updates is my guess.

Downloading the factors directly sounds like a good idea, the files download pretty quickly. Do we want to expose the webreader functions or wrap them ourselves and keep local versions cached? Using pandas makes it easy to have access to all of the F-F data sets available on the edu site.

justinlent commented 9 years ago

@twiecki @humdings the daily FF are at this URL: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip

which is linked to from this data page on Ken French's academic research page - scroll down to section "U.S. Research Returns Data (Downloadable Files)", then a couple links down, the zip is at the link named: Fama/French 3 Factors [Daily]

There's actually TONS of FF data here that may become useful down the road

http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

humdings commented 9 years ago

Pandas is awesome once again! These few lines load the daily fama-french data and converts the index to actual timestamps.

import pandas as pd
import pandas.io.data as web

factors = web.DataReader("F-F_Research_Data_Factors_daily", "famafrench")[0]
factors.index = pd.to_datetime(factors.index, format="%Y%m%d", utc=True)
justinlent commented 9 years ago

great @humdings ! we can hold off on implementing this for the initial public release i think. i'd want to do some testing first to make sure everything works correctly since we weren't pulling directly in the past. e.g. we should wrap the data pull with error checking and failover to the pre-existing csv file, etc

humdings commented 9 years ago

Another option is Quandl https://www.quandl.com/data/KFRENCH/FACTORS_D-Fama-French-Factors-Daily

gusgordon commented 9 years ago

Dartmouth page is down right now. Quandl seems more reliable, but we could have a double fallback i.e. dartmouth -> quandl -> local. I'll try to do this today.

gusgordon commented 9 years ago

Dartmouth seems to be up at your second link here: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

I think the pandas function and old URL are broken, but we can unzip files with python so shouldn't be a problem.

I can't seem to find UMD anywhere. @justin @humdings anyone know where that's from? Also maybe consider replacing UMD with Rf, but I'm assuming we have a good reason for wanting UMD.

gusgordon commented 9 years ago

From justin, UMD = momentum factors -> http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_daily_CSV.zip