pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.26k stars 17.8k forks source link

ME breakpoints in fama french factors library #8842

Closed MichaelWS closed 9 years ago

MichaelWS commented 9 years ago

ME_breakpoints.zip is not read because it is a different format. In this zip, there is only one file "ME_Breakpoints.txt". How should this be handed?

MichaelWS commented 9 years ago

I am happy to put a PR that just reads the file using an if statement.

jreback commented 9 years ago

@MichaelWS where/what is this breakpoints file?

MichaelWS commented 9 years ago

the breakpoints between the deciles. For example what is the 5%, 10%, etc.

http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Breakpoints It's in the same location as the rest. for example: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/ME_Breakpoints.zip

MichaelWS commented 9 years ago

I am not sure how much this should be in pandas in general, but I feel like it should just work if famafrench is there.

Someone brought this particular file up to me today.

jreback commented 9 years ago

to the extent that other FamaFrench data exists not adverse to adding this. That said, might make sense to sping off some of these data utils to a new repo, maybe pandas-data, interested?

MichaelWS commented 9 years ago

I agree that it makes sense.

MichaelWS commented 9 years ago

I definitely think some of the data makes more sense in a separate repo. I feel like data needs much more urgent releases than pandas would

MichaelWS commented 9 years ago

jreback, how would you envision pandas-data to work

jorisvandenbossche commented 9 years ago

@MichaelWS let's discuss this in a separate issue (I am creating one)

mmeanwe commented 9 years ago

Not sure this is right place for this but new to Pandas and trying FamaFrench pandas: This works:

read data from Ken French's website

ff = web.DataReader('F-F_Research_Data_Factors', 'famafrench')[0] ff.columns = ['xsm', 'smb', 'hml', 'rf']

But this does not

read data from Ken French's website

ff = web.DataReader('F-F_Momentum_Factor', 'famafrench')[0] ff.columns = ['mom'] Throws KeyError: 0 Any suggestions? Thanks

jorisvandenbossche commented 9 years ago

@gitmmeanwell web.DataReader('F-F_Momentum_Factor', 'famafrench') returns a dict with keys [1, 2], so that is why get a KeyError.

mmeanwe commented 9 years ago

Thanks for the reply - any suggestions on how to resolve? New to pandas and data frames. On Feb 8, 2015 5:32 AM, "Joris Van den Bossche" notifications@github.com wrote:

@gitmmeanwell https://github.com/gitmmeanwell web.DataReader('F-F_Momentum_Factor', 'famafrench') returns a dict with keys [1, 2], so that is why get a KeyError.

— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/8842#issuecomment-73407084.

jorisvandenbossche commented 9 years ago

Not using 0 in web.DataReader('F-F_Momentum_Factor', 'famafrench')[0]?

I don't know what you are trying to achieve, so difficult to say. I also don't exactly know the structure of that dataset. So you will have to specify your question more (but maybe http://stackoverflow.com/ is a better place to ask)

mmeanwe commented 9 years ago

Here is some more info on Use case - I'll head to StackOverflow to research further unless you have any final comments? Thanks again!

Goal: use Pandas to read file from FamaFrench website and perform some analysis.

load packages (if it's redundant it'll be ignored)

import pandas.io.data as web

anything after the hashtag is a comment

%reset

import datetime as dt import matplotlib.pyplot as plt # plotting tools

the next one is an IPython command: it says put plots here in the

notebook, rather than in a separate window. %matplotlib inline

read data from Ken French's website

ff = web.DataReader('F-F_Momentum_Factor', 'famafrench')

NB: ff.xs is a conflict, rename to xsm

ff.columns = ['mom']

txt file structure example (file downloaded to local c:) - this is the file F-F_Momentum_Factor that is read from site and loaded to local c:\

This file was created by CMPT_ME_PRIOR_RETS using the 201412 CRSP database. It contains a momentum factor, constructed from six value-weight portfolios formed using independent sorts on size and prior return of NYSE, AMEX, and NASDAQ stocks. Mom is the average of the returns on two (big and small) high prior return portfolios minus the average of the returns on two low prior return portfolios. The portfolios are constructed monthly. Big means a firm is above the median market cap on the NYSE at the end of the previous month; small firms are below the median NYSE market cap. Prior return is measured from month -12 to - 2. Firms in the low prior return portfolio are below the 30th NYSE percentile. Those in the high portfolio are above the 70th NYSE percentile.

Missing data are indicated by -99.99 or -999.

      Mom

192701 0.44 etc etc

On Sun, Feb 8, 2015 at 6:19 AM, Joris Van den Bossche < notifications@github.com> wrote:

Not using 0 in web.DataReader('F-F_Momentum_Factor', 'famafrench')[0]?

I don't know what you are trying to achieve, so difficult to say. I also don't exactly know the structure of that dataset. So you will have to specify your question more (but maybe http://stackoverflow.com/ is a better place to ask)

— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/8842#issuecomment-73408506.

mmeanwe commented 9 years ago

Moving comment/issue to StackOverflow.

MichaelWS commented 9 years ago

web.DataReader('F-F_Momentum_Factor', 'famafrench') creates a dict with keys 1,2. 1 works for daily returns.
when we break this into a separate project, I would eventually like to make the keys ,"daily", "monthly", "annual" so it's easier to understand.

jorisvandenbossche commented 9 years ago

@MichaelWS be sure to open an issue for that at https://github.com/pydata/pandas-datareader/issues to not forget that! (as the separate project is already there)

jorisvandenbossche commented 9 years ago

and maybe also move this issue if you want

jorisvandenbossche commented 9 years ago

Closing this in favor of https://github.com/pydata/pandas-datareader/issues/21