wassname / rl-portfolio-management

Attempting to replicate "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem" https://arxiv.org/abs/1706.10059 (and an openai gym environment)
552 stars 180 forks source link

Do you have any idea why model didn't learn much? #2

Closed jd0713 closed 7 years ago

jd0713 commented 7 years ago

As you know, portfolio weights appear to static at test set. Why didn't model learn so much?

I am assuming several reasons.

  1. there are no ensemble in DDPG network
  2. input doesn't include previous portfolio weight
  3. there are no online stochastic batch learning when applying network to test set
  4. different universe
  5. no PVM
  6. Used ddpg instead of dpg

because author said that EIIE is there core idea in this paper, I think if I solve 1 and 2, there will be some improvement.

Will you give some idea of yours?

Thank you very much.

wassname commented 7 years ago

Are you using one of my notebooks or your own? And what do you mean by the test set or are you applying it to recent data?

This happened for me during training when:

  1. The critic network wasn't large enough to model the reward, so I added more parameters
  2. It learns from each action in sequence. But then the algorithm can just learn the current market and forget past markets. Lots of new algorithms try to fix this by either having multiple actors learning at once (I think the original paper did this), or using replay or prioritised replay memory. If this doesn't make sense then the prioritised experience paper explains it well.
  3. It failed to converge because SGD has the wrong parameters, in this case try RMSProp since it converges easier (or Adam but adam isn't as good for moving targets).

And for your list of ideas I think 2 and 5 couldn't be the cause since it makes it easier for it to give varied predictions. It shouldn't be 3. since it shouldn't be learning on the test set.

But yeah I reckon it could be 1., 4., or 6.

jd0713 commented 7 years ago

I used your notebook, and data you uploaded. I mean that there are only slight difference between equal-weighted portfolio and learned policy. Also, there are almost no rewards gained while training. I think that portfolio weight, accumulated return, like these should be different if 'something' is learned, and result doesn't seemed to. I am wondering if there's way to model learn 'something'

In short, why there's no episode reward?

I am also curious if this phenomenon is derived from limit of observation space. Because observation space is prices divide by last open price, it can be over 1. Thanks.

Because I'm newbie at deep learning, my question might be vague.

wassname commented 7 years ago

I ran into similar problems, I found that the DDPG model was less likely to learn, but the VPG notebook was more stable. So my advice would be to try the VPG notebook and if that doesn't work maybe start of with with a simpler project. Deep reinforcement learning is pretty new and lots of libraries are broken and lots of algorithms are unstable. So maybe start with image classification projects if this is frustrating you?

If your determined to do this though, it's probably worth reading the papers belonging to the algorithms since they explain all the parameters and limitations.

This project didn't get good results for me (although another guy did a pytorch implementation and got decent results, but I'm not sure if he plans on publishing it). I'm wondering if there is a problem in my data or my environment since I tried many algorithms but kept the data and environment constant. Or maybe it's just difficult dataset.

jd0713 commented 7 years ago

Thanks for your nice comment. And I also read conversation between you and goodulusaurs after you comment. I should better use DPG and ensemble them which is just same as the paper. I will update if I have good result! Thank you again!

wassname commented 7 years ago

I'd be interested to know if you get any results using DPG! Good luck.