Nan values in df - Githubissues

yevski commented 2 years ago

Hi, thanks for an excellent project. What should I replace NaN values with in my dataframe if some stocks have more historical data than the others, while some are extremely new?

Thanks

dev590t commented 2 years ago

Maybe you can try

    df = pd.read_csv(file, parse_dates = True,date_parser=pd.Timestamp, index_col="date")
    df.dropna(how='any',inplace=True,axis=1)

alvarocperez commented 2 years ago

Hi yevski,

If you do not want to lose market days and also want a solution that does not affect the calculation of returns, you can try to fill those gaps with the last known price of the asset.

For those that have not yet traded, you could put the price of the first day of trading.

For example, in a dataset where columns are assets and rows are closing prices:

assets.sort_index(ascending=False, inplace=True) # firstly reverse the assets to fill the gaps of those that have not yet listed at the price they will have on day one.

assets.fillna(method='ffill', inplace=True) # fill gaps.

assets.sort_index(ascending=True, inplace=True) # reverse again to leave the dataset in its original form.

assets.fillna(method='ffill', inplace=True) # fill in the gaps for those assets that have already been quoted.

robertmartin8 / PyPortfolioOpt

Nan values in df #417