timeolord / Reinforcement-Learning-Stock-Trader

Using a modified version of Werner Duvaud's MuZero implementation (https://github.com/werner-duvaud/muzero-general) this reinforcement agent learns to trade stocks based on Yahoo Finance data.
17 stars 7 forks source link

Results after about 5 hours #6

Closed widebowl closed 3 years ago

widebowl commented 3 years ago

Hi. I tried running your code on Ubuntu Linux after installing the virtual machine. Take a look at the results below and let me know why. This is the result of running for about 5 hours.

Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30914.91. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk...p: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30914.91. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -31942.12. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk...p: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -31942.12. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30468.08. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -30468.08. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -29278.16. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -29350.38. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk...p: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -29350.38. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -31926.97. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -31926.97. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30892.09. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -30892.09. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -29666.60. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -29666.60. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30646.84. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -30646.84. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -24786.48. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk...p: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -24786.48. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk...p: 0/50000000. Played games: 1. Loss: 0.00 (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -31290.79. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -31290.79. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30271.14. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -30271.14. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -32123.04. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -32123.04. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -31048.90. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -31048.90. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30950.96. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -30950.96. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30795.34. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -30795.34. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -32585.52. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -30935.74. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -30935.74. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk...p: 0/50000000. Played games: 1. Loss: 0.00 (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -32087.31. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk...p: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -32087.31. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk...p: 0/50000000. Played games: 1. Loss: 0.00 (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -28482.38. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... Saving modelward: -28482.38. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... (pid=2276) Finished a game!. Training step: 0/50000000. Played games: 1. Loss: 0.00 Saving modelward: -31326.23. Training step: 0/50000000. Played games: 1. Loss: 0.00 Persisting replay buffer games to disk... ^Zst test reward: -31326.23. Training step: 0/50000000. Played games: 1. Loss: 0.00

timeolord commented 3 years ago

Hmm this is strange and I'm not really sure what is happening here, it seems the actual worker that is supposed to play the game isn't doing that. Can you check if ray has multiple workers actually playing the game? There should be atleast two.