instadeepai / Mava

🦁 A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX
Apache License 2.0
709 stars 83 forks source link

New mava network: epsilon greedy action head for discrete action spaces #1051

Closed lbeyers closed 6 months ago

lbeyers commented 7 months ago

What?

An additional network was added to the mava networks file. It fits into the final part of an RNN for Q-learning, and outputs a distribution that can be sampled in the same way that action distributions for the other systems are sampled, as well as the q-values obtained.

Why?

To be consistent with other mava systems in the Q-learning implementations. The change has the side effect of simplifying the rec-iql file.

How?

A new network was added which performs a lot of the epsilon greedy maths.

Extra

Concerns:

lbeyers commented 7 months ago

I think this is ready to merge - definitely check the docstrings I added to the distribution, it's not the traditional format but I think it may be helpful!

lbeyers commented 6 months ago

Discussed offline: will go in with the IQL PR!