Closed shermansiu closed 8 months ago
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated |
---|---|---|---|---|
cleanrl | ✅ Ready (Inspect) | Visit Preview | 💬 Add your feedback | Jan 25, 2023 at 8:13PM (UTC) |
Okay... a single loop seems to work, but now I get OOM errors, as expected.
For now, I'm debugging with python muesli_atari_envpool_async_jax_scan_impalanet_machado.py --replay-buffer-size=6000 --update-batch-size=2
.
Wow @shermansiu you rock! I haven't seen a prototype being produced so quickly! It would be great if you could share some of the tracked experiments so that we get a peek at the metrics (e.g., SPS)
@vwxyzjn Thanks so much! Here's a report with the tracked metrics!
The current reported returns are a bit lacking... Maybe incrementally modifying your implementation of #338 towards being more like Muesli would help! Something along the lines of:
Incrementally adding stuff would be a good idea — it would give you more visibility on what works and what doesn't :)
I would suggest making separate files, such as
muesli_replay.py
muesli_rb_lstm.py
muesli_rb_lstm_mpo
... and so on.
Yeah, that was my plan, thanks!
Any updates? @vwxyzjn @shermansiu
I got busy with my own research and Costa ended up finishing CleanBa and doing a minimal reproduction of GPT2 after LLMs got popular.
I always thought of returning to this at some point, but after Costa closed both this issue and my WIP PR, I assumed it was no longer needed and I moved on. I'm down to finish this up at some point, though I have a lot of items on my to-do list right now.
Description
Implement Muesli. Resolves #350.
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.