In this PR, we add continuous networks and training functions to transform the log probability and the actions produced. It also modifies configs and the JaxMARL wrapper accordingly, since mabrax doesn't provide a global state.
Why?
These changes add support for environments with continuous actions.
How?
Add continuous networks that return actor_mean and actor_log_std.
Use utils functions to sample action and log_prob from the distribution and transform them.
Add a global state to mabrax by simply concatenating agent observations.
Extra
This PR will be followed by a full PR that will add the various continuous systems as well as their configs and evaluator changes.
We're closing this PR since we'll adopt new ways to initialise the action's distribution in the continuous case with non-vmapped networks and avoid creating different systems files.
What?
In this PR, we add continuous networks and training functions to transform the log probability and the actions produced. It also modifies configs and the JaxMARL wrapper accordingly, since mabrax doesn't provide a global state.
Why?
These changes add support for environments with continuous actions.
How?
Extra
This PR will be followed by a full PR that will add the various continuous systems as well as their configs and evaluator changes.