robertmartin8 / PyPortfolioOpt

Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity
https://pyportfolioopt.readthedocs.io/
MIT License
4.38k stars 940 forks source link

Max number of assets #369

Closed ghost closed 2 years ago

ghost commented 3 years ago

Hi I have downloaded the full nasdaq data, 3300 stocks, and i have it inside a pandas dataframe, correctly indexed etc. I would like to create an optimal portfolio by passing to PyPorfolioOpt the entire dataset. My question is, there is a max number of stocks that i can insert, so i'm limited to use for example only 100 stocks per time, or could i pass the number of stocks that i prefer without any limit? Thanks for the answer.

Another question in my mind is, which is the minimum limit of size of the sotcks time series? For example can i use only 100 days of history, or there is a minimum number of days that i should pass?

phschiele commented 3 years ago

Hi @NightFox5,

this question has been asked a few times before (e.g. 243, 113, 45, 13). TL;DR: You might try one of the following:

@robertmartin8 Seeing that this question arises repeatedly, do you think it makes sense to add a short paragraph to the docs?

For your second question: From a technical point of view, having fewer observations than assets will make the sample covariance matrix singular. Shrinkage estimators can help in this case. In general, there is a trade between the benefits of having more data and the drawbacks of having older data, which might be less relevant.

ghost commented 3 years ago

@phschiele Thanks for the answer. it was very enlightening, I had a wrong opinion of what this library can do, I wanted to use it for asset selection, by giving it every stock. So now i'm at the start again, any suggestion to select the assets?

phschiele commented 3 years ago

@NightFox5 I'm afraid I can't give a simple answer to that, as asset selection in itself is quite a vast topic. Feel free to reach out if you are facing a specific issue with the library again!

NatanBagrov commented 2 years ago

Hi, you can use Orthogonal Matching Pursuit to find the $n$ most correlative stocks to the index itself, then you can apply efficient frontier.

robertmartin8 commented 2 years ago

@phschiele it's in the FAQs, but I'll add a bit more detail as per your response. Alternatively, do you think there's somewhere else in the docs it should be mentioned?

robertmartin8 commented 2 years ago

@NatanBagrov that's an interesting suggestion – something new for me.

Conceptually though, I wonder whether you want to find the most or least correlated stocks to the index?

phschiele commented 2 years ago

@robertmartin8 Thanks for the heads-up, I think that's a good place.

@phschiele it's in the FAQs, but I'll add a bit more detail as per your response. Alternatively, do you think there's somewhere else in the docs it should be mentioned?

NatanBagrov commented 2 years ago

@NatanBagrov that's an interesting suggestion – something new for me.

Conceptually though, I wonder whether you want to find the most or least correlated stocks to the index?

That depends on your needs. You can also alter the data artificially to avoid drawdowns,and then use OMP, followed by a Markowitz optimization procedure (which might be redundant). From my experience in most cases it is harder to achieve a good sharpe-ratio for a market neutral (least correlation) algorithm, given daily data.

vskritsk commented 2 years ago

@phschiele it's in the FAQ, but why ambiguous abbreviations? Looks as cp is the result of the import statement, but it's missing. Found after a bit of time searching a candidate import cvxpy as cp

Another note, you provide df = pd.read_csv("tests/resources/stock_prices.csv", parse_dates=True, index_col="date") but hey, why not print first 2 rows with schema and include in the docs?

robertmartin8 commented 2 years ago

Hi @vskritsk,

Cheers for the feedback! The reality is that maintaining an open source project (particularly documentation) is quite a challenge – all of the contributors (myself included) have fulltime work/education, so it's natural for some of the details to slip through.

Regarding your points, I agree that it would be better to have import cvxpy as cp – I will push a fix for that when I get a moment.

For the dataframe comment, the only reason it's not added there is to avoid duplication. The dataframe format is described in the User Guide, the pages for expected returns and risk models, the cookbook, and a sample file has been provided.

Best, Robert