I'm questioning the paper's validity and/or generalizability

devilhtc commented 6 years ago

I'm currently doing a course project and the paper has very similar ideas to ours. However our network is not learning much at all. Of course this might be because we trained our network on 110 stocks with 10 in each batch and tested it on another 20 stocks. Training data and test data are all from the S&P 500 over the past 3 years. We have built a complete evaluation pipeline and we are getting crappy results.

Although our results are quite preliminary, it is likely that since in the paper they operated on a certain set of cryptocurrency the network kind of remembers the scores (e.g. how well each currency can perform) and they got very good results with these 'memory' stored in the last few fc layers?

I'm not sure though since I'm new to deep learning and this is my first time using deep reinforcement learning. But it seems to me that data in the finance world has too much noise and the output of neural networks can easily be swayed with its complex model and, please allow me to put it in this way (although it is a continuous setting instead of a classification/discrete problem), decision boundaries. If neural network can solve the portfolio management problem so well probably few students at a less well-known university would not be the first to come up a with a good application in finance.

I'll keep you posted for any later development as our project progress. In the meantime, if you'd like, I am very interested to hear your thoughts on the subject. Thank you!

wassname commented 6 years ago

It's good to be skeptical of everything but even John Schulman of openai said that RL currently has a problem where paper are very hard to produce due to subtle differences. Meanwhile the author has put up their code. It's possible that large institutions have come up with things but kept it as a commercial secret. Large hedge funds certainly try.

So I can't say for sure either way right now. If I had to guess then I think it works, although I'm not sure about extremely large returns.

devilhtc commented 6 years ago

Great, my teammates can't do shit and I probably don't have more time running down this rabbit hole (I learned a lot though). Sorry for intruding into your repo, you can close this issue any time. But I thought about the problems:

Is softmax problematic? Since it first mapped 'scores' for each stock to a exponential scale. Slight changes in the scores, in their EIIE implementation, can result in large change in distribution. I would say a sigmoid + normalization might work better at the end.
Can other architectures help e.g. actor-critic? We are receiving a concrete reward at each time step, but it might be bad for learning. Using a critic to guide the network might be better. Well, I guess if there is any progress in this front we would know how it goes. Thank you!

wassname commented 6 years ago

I agree, I tried putting is last weights as log(last_weights) to counteract the softmax but didn't help much
I've tried a bunch of things and it seems that having a critic or a stochastic output hurts it in general even if the critic_loss converges and the output distribution is narrow.

One other things I noticed is that going through the data in sequential order seems to help. Perhaps it helps reinforce temporary patterns.

Yeah it's a frustrating problem! And RL is quite tricky since there aren't many kaggle competitions to learn tricks from.

wassname / rl-portfolio-management

I'm questioning the paper's validity and/or generalizability #9