Add Muesli - Githubissues

shermansiu commented 1 year ago

Description

Implement Muesli. Resolves #350.

Types of changes

[ ] Bug fix
[ ] New feature
[x] New algorithm
[ ] Documentation

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[x] I have updated the documentation and previewed the changes via mkdocs serve.
[x] I have updated the tests accordingly (if applicable).

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

[x] I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
[ ] I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
[ ] I have added additional documentation and previewed the changes via mkdocs serve.
- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers (if applicable).
- [ ] I have added links to the PR related to the algorithm variant.
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves (in PNG format).
- [ ] I have added links to the tracked experiments.
- [ ] I have updated the overview sections at the docs and the repo
[ ] I have updated the tests accordingly (if applicable).

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Jan 25, 2023 at 8:13PM (UTC)

shermansiu commented 1 year ago

Okay... a single loop seems to work, but now I get OOM errors, as expected.

shermansiu commented 1 year ago

For now, I'm debugging with python muesli_atari_envpool_async_jax_scan_impalanet_machado.py --replay-buffer-size=6000 --update-batch-size=2.

vwxyzjn commented 1 year ago

Wow @shermansiu you rock! I haven't seen a prototype being produced so quickly! It would be great if you could share some of the tracked experiments so that we get a peek at the metrics (e.g., SPS)

shermansiu commented 1 year ago

@vwxyzjn Thanks so much! Here's a report with the tracked metrics!

shermansiu commented 1 year ago

The current reported returns are a bit lacking... Maybe incrementally modifying your implementation of #338 towards being more like Muesli would help! Something along the lines of:

[ ] Add Muesli replay buffer to PPO (online queue only)
[ ] Add LSTM to PPO
[ ] Turn off entropy regularization
[ ] Use MPO policy loss
[ ] Add CMPO regularization
[ ] Add dynamics network losses

vwxyzjn commented 1 year ago

Incrementally adding stuff would be a good idea — it would give you more visibility on what works and what doesn't :)

I would suggest making separate files, such as

muesli_replay.py muesli_rb_lstm.py muesli_rb_lstm_mpo ... and so on.

shermansiu commented 1 year ago

Yeah, that was my plan, thanks!

xnerhu commented 6 months ago

Any updates? @vwxyzjn @shermansiu

shermansiu commented 6 months ago

I got busy with my own research and Costa ended up finishing CleanBa and doing a minimal reproduction of GPT2 after LLMs got popular.

I always thought of returning to this at some point, but after Costa closed both this issue and my WIP PR, I assumed it was no longer needed and I moved on. I'm down to finish this up at some point, though I have a lot of items on my to-do list right now.

vwxyzjn / cleanrl

Add Muesli #354

Description

Types of changes

Checklist: