Closed zacheberhart closed 6 years ago
I went through lots of revisions, saving them as git commits. So the best I got was 25X on training data which disappeared on testing.
The current example uses PPO which doesn't seem to do very well. Does that make sense?
I'm kinda confused with your notes on performance. In your README, you mention that the VPG under-performs on the test set against baseline. But then in the notebook, you mention that you get an annualized 25X and you are using PPO -- but that doesn't appear to be the case judging by the test performance plot included at the bottom of the notebook.
I'm currently running my own implementation and am waiting for it to finish training, but I'm curious -- were you able to get this working? I know the PPO algo is fairly new so maybe those notes were left over from your initial testing? Any clarification would be helpful :)
Thanks!