Open walesdata opened 3 years ago
Your code base is really good, I really like the multi-process environment, that's a legit innovation.
Update: After more training, it seems to be converging to not trading at all!
episode: 46289 worker: 28 net worth: 1008.27 average: 1113.74 orders: 2 episode: 46290 worker: 7 net worth: 1000.00 average: 1111.90 orders: 0 episode: 46291 worker: 4 net worth: 1318.49 average: 1115.07 orders: 46 episode: 46292 worker: 11 net worth: 1435.02 average: 1117.48 orders: 56 episode: 46293 worker: 0 net worth: 1002.71 average: 1116.31 orders: 1 episode: 46294 worker: 13 net worth: 1857.55 average: 1121.94 orders: 147 episode: 46295 worker: 10 net worth: 1000.00 average: 1121.94 orders: 0 episode: 46296 worker: 22 net worth: 1000.00 average: 1118.52 orders: 0 episode: 46297 worker: 1 net worth: 999.59 average: 1113.98 orders: 2 episode: 46298 worker: 27 net worth: 1000.00 average: 1113.98 orders: 0 episode: 46299 worker: 12 net worth: 1000.00 average: 1113.98 orders: 0 episode: 46300 worker: 18 net worth: 1000.00 average: 1113.49 orders: 0 episode: 46301 worker: 5 net worth: 1149.69 average: 1108.48 orders: 30 episode: 46302 worker: 3 net worth: 884.92 average: 1107.62 orders: 2 episode: 46303 worker: 9 net worth: 884.11 average: 1106.75 orders: 2 episode: 46304 worker: 25 net worth: 1116.05 average: 1108.33 orders: 38 episode: 46305 worker: 30 net worth: 1000.00 average: 1108.15 orders: 0 episode: 46306 worker: 31 net worth: 1000.00 average: 1107.60 orders: 0 episode: 46307 worker: 8 net worth: 1000.00 average: 1107.60 orders: 0 episode: 46308 worker: 21 net worth: 1000.00 average: 1108.09 orders: 0 episode: 46309 worker: 14 net worth: 1097.75 average: 1110.13 orders: 14 episode: 46310 worker: 26 net worth: 1000.00 average: 1110.13 orders: 0 episode: 46311 worker: 16 net worth: 907.64 average: 1109.20 orders: 1
Hello!
I was reviewing your code base, considered using it as part of a demo for a class I teach. The initial run didn't seem to be learning much. I went into the function AddIndicators and added 2 new indicators:
So, the idea here is that I will tell the bot what the return will be in 1 hour and in 8 hours. With this information, a human trader could make a huge return. I've done this same test on 3 other code bases, and only 1 could actually learn this.
After implementing this change and re-running the bot, for thousands of episodes, it does not seem to have learned much. The average return and episodic return aren't zooming up like I would expect.
episode: 26340 worker: 21 net worth: 943.79 average: 1046.29 orders: 2 episode: 26341 worker: 14 net worth: 846.84 average: 1044.38 orders: 6 episode: 26342 worker: 26 net worth: 1019.90 average: 1045.11 orders: 34 episode: 26343 worker: 16 net worth: 1661.54 average: 1051.10 orders: 92 episode: 26344 worker: 20 net worth: 1020.38 average: 1051.17 orders: 49 episode: 26345 worker: 24 net worth: 989.14 average: 1052.00 orders: 3 episode: 26346 worker: 19 net worth: 990.24 average: 1052.65 orders: 4
This is very similar to the first few episodes, except generally the number of orders has declined.
This might be due to the convolution layer 'blurring out' or averaging away the ability of the bot to notice that one of its features is very helpful.
Expected behavior: Return should get much higher when the bot is provided with perfect information from the future. Actual behavior: Doesn't seem to change anything.
Thank you!