tcloaa / Deep-Portfolio-Theory

Autoencoder framework for portfolio selection (paper published by J. B. Heaton, N. G. Polson, J. H. Witte.)
128 stars 63 forks source link

Communal information #6

Closed valchik closed 6 years ago

valchik commented 6 years ago

Hey tcloaa, thank you a lot for your replication of Heaton/Polson paper. Very interesting to read. I have one question regarding the communal information and the construction of s25, s45 and s65 portfolios. Heaton/Polson say that

S25, S45, etc. denote the number of stocks used. After ranking the stocks in auto-encoding, we are increasing the number of stocks by using the 10 most communal stocks plus x-number of most non-communal stocks (as we do not want to add unnecessary communal information); e.g., 25 stocks means 10 plus 15 (where x=15).

You use here the following:

for non_communal in [15, 35, 55]:  
    # some numerical values
    encoding_dim = 5
    s = 10 + non_communal
    stock_index = np.concatenate((ranking[0:10], ranking[-non_communal:])) # portfolio index`

which implies the same as Heaton/Polson claim in their paper. However, when we run the code, we see that the portfolios increase by the number of communal stocks, not non-communal. Thus, ranking[0:10] states the fixed amount of non-communal stocks and ranking[-non_communal:] add various communal stocks. Would you be so kind to comment on this or correct me please if I am mistaken?

tcloaa commented 6 years ago

Hello,

For we see that the portfolios increase by the number of communal stocks, not non-communal..

Can you attach your screenshot here for reference?

Thanks,

skyhoudou commented 6 years ago

Hello tcloaa,

I have been focus on this paper for months. Actually, i am not very clear about the concept of the communal information. The IBB index is calculated from prices of its component stocks and it is based on capitalization weight. Why don't we just select component stocks which have large capitalization in the market? I am confused about the meaning of the ranking based on results after auto-encoding and why we choose 10 lowest difference stocks plus some highest difference stocks? My results shows that the more stocks join in the calculation, the better results. By the way, the overfitting is severe. If you are interested in index tracking problem, we can have a discuss about these problems.

Anyway, many thanks to your work.

tcloaa commented 6 years ago

@skyhoudou

Hello, I am not the author of the paper so I can only give my own understanding. Hope you understand.

  1. For communal information, proximity of a stock to its auto-encoded version (e.g. 2-norm difference) provides a measure for the similarity of a stock with the stock universe. See p11, the paper.

  2. For choosing stocks, I think it means: you first choose 10 most communal (lowest difference stocks, means most similar to its auto version) stocks, then choose x most non-communal (highest difference stocks), as the author stated: "as we do not want to add unnecessary communal information). You can treat it as adding non-learned features". See p12, the paper.

  3. For index weight, from my perspective, yes, the index weight is based on component market cap, let's say W1 for all components S. However, it is unlike to invest all S. That's why you want to create a portfolio s ⊆ S, and therefore a corresponding W2 for s. Here the paper uses NN fitting to find this weight mapping.