jurgisp / memory-maze

Evaluating long-term memory of reinforcement learning algorithms
MIT License
129 stars 13 forks source link

PPO Baseline for MemoryMaze #17

Open subho406 opened 1 year ago

subho406 commented 1 year ago

Hi, Great environment. Just wondering, is there a PPO baseline available for this environment?

zdx3578 commented 1 year ago

use. https://github.com/Stable-Baselines-Team/stable-baselines3-contrib. RecurrentPPO aka PPO LSTM). ??

subho406 commented 1 year ago

Thanks, I was looking if there are known/tuned hyperparameters available for this environment. I already have an implementation atari cnn + lstm (https://github.com/subho406/Recurrent-PPO-Jax) based of CleanRL implementation . I tried it with default atari hyperparameters but doesnt seem to be learning on this environment.

zdx3578 commented 1 year ago

https://github.com/NM512/dreamerv3-torch/issues/18

jurgisp commented 1 year ago

@subho406 We have tried running PPO baseline, but it was pretty much flatlining at 0.

zdx3578 commented 1 year ago

@jurgisp what about this greate cell? https://github.com/NeuromorphicComputing/STPN

subho406 commented 1 year ago

I tried synchronous PPO on this problem but it did not seem to work very well, it seems to saturate at a score of around 6-7. But I was able to get close to the IMPALA baseline in Memory Maze 9x9 using the Asynchronous PPO implementation from Sample Factory (https://www.samplefactory.dev/). I used the default hyper-parameters mentioned in their DMLab experiments (https://www.samplefactory.dev/09-environment-integrations/dmlab/), and change sequence length to 100 and number of sequences to 32. It seems to work pretty well! I am getting a reward of around 20ish after 100 million steps.

zdx3578 commented 1 year ago

@jurgisp in paper benchmark, dreamer v2 run maze code config will open ? @subho406 like dreamerv3-torchconfig batch_length = 100 batch_size = 32 ?

jurgisp commented 1 year ago

@subho406 that's very interesting that you got reasonable results with Asynchronous PPO. Would you be able to share the results? Did you try it on all 4 sizes of memory maze, or just on 9x9?

@zdx3578 can you clarify your question?

zdx3578 commented 1 year ago

@jurgisp in memroy_maze paper benchmark, experiment like dreamerv2 run maze , vae+gru run maze will open source share other people to run or improve base it ?

zdx3578 commented 1 year ago

VAE+GRU can change experiment by VAE+STPN.

subho406 commented 1 year ago

@subho406 that's very interesting that you got reasonable results with Asynchronous PPO. Would you be able to share the results? Did you try it on all 4 sizes of memory maze, or just on 9x9?

@zdx3578 can you clarify your question?

@jurgisp Yes, I am working on the results for a paper. Happy to share them when they are ready!