instadeepai / Mava

🦁 A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX
Apache License 2.0
709 stars 83 forks source link

feat: stop accessing LogEnvState variables directly #994

Closed sash-a closed 8 months ago

sash-a commented 8 months ago

What?

Stop accessing LogEnvState variables directly as this forces us to always make the LogEnvWrapper the outermost wrapper - so we can access it's variables through state.episode_return instead of state.env_state.episode_return and if we have an unknown number of wrappers we can't know how deep in the env_state chain we have to go. So instead pass the variables we care about through timestep.extras.

Also changed some file names to make stuff more clear :smile: Also renamed the LogEnvState to bring it more inline with the gym wrapper that does similar things.

Why?

Allows for vault wrapper to also not be the outermost wrapper if we start passing the state info through timestep.extras

OmaymaMahjoub commented 8 months ago

If you can double check the quickstart notebook working properly else happy to approve 🙌

I tested the notebook and it raises some errors, the main changes should be fixing the imports, getting the episode return from timesteps.extras, and using RecordEpisodeMetricsState instead of LogWrapper