Feat: Continuous actions networks and training utils

What?

In this PR, we add continuous networks and training functions to transform the log probability and the actions produced. It also modifies configs and the JaxMARL wrapper accordingly, since mabrax doesn't provide a global state.

Why?

These changes add support for environments with continuous actions.

How?

Add continuous networks that return actor_mean and actor_log_std.
Use utils functions to sample action and log_prob from the distribution and transform them.
Add a global state to mabrax by simply concatenating agent observations.
Extra

This PR will be followed by a full PR that will add the various continuous systems as well as their configs and evaluator changes.

instadeepai / Mava

Feat: Continuous actions networks and training utils #999

What?

Why?

How?

Extra