vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.02k stars 575 forks source link

Add Muesli #354

Closed shermansiu closed 8 months ago

shermansiu commented 1 year ago

Description

Implement Muesli. Resolves #350.

Types of changes

Checklist:

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Jan 25, 2023 at 8:13PM (UTC)
shermansiu commented 1 year ago

Okay... a single loop seems to work, but now I get OOM errors, as expected.

shermansiu commented 1 year ago

For now, I'm debugging with python muesli_atari_envpool_async_jax_scan_impalanet_machado.py --replay-buffer-size=6000 --update-batch-size=2.

vwxyzjn commented 1 year ago

Wow @shermansiu you rock! I haven't seen a prototype being produced so quickly! It would be great if you could share some of the tracked experiments so that we get a peek at the metrics (e.g., SPS)

shermansiu commented 1 year ago

@vwxyzjn Thanks so much! Here's a report with the tracked metrics!

shermansiu commented 1 year ago

The current reported returns are a bit lacking... Maybe incrementally modifying your implementation of #338 towards being more like Muesli would help! Something along the lines of:

vwxyzjn commented 1 year ago

Incrementally adding stuff would be a good idea — it would give you more visibility on what works and what doesn't :)

I would suggest making separate files, such as

muesli_replay.py muesli_rb_lstm.py muesli_rb_lstm_mpo ... and so on.

shermansiu commented 1 year ago

Yeah, that was my plan, thanks!

xnerhu commented 6 months ago

Any updates? @vwxyzjn @shermansiu

shermansiu commented 6 months ago

I got busy with my own research and Costa ended up finishing CleanBa and doing a minimal reproduction of GPT2 after LLMs got popular.

I always thought of returning to this at some point, but after Costa closed both this issue and my WIP PR, I assumed it was no longer needed and I moved on. I'm down to finish this up at some point, though I have a lot of items on my to-do list right now.