New mava network: epsilon greedy action head for discrete action spaces

lbeyers commented 7 months ago

What?

An additional network was added to the mava networks file. It fits into the final part of an RNN for Q-learning, and outputs a distribution that can be sampled in the same way that action distributions for the other systems are sampled, as well as the q-values obtained.

Why?

To be consistent with other mava systems in the Q-learning implementations. The change has the side effect of simplifying the rec-iql file.

How?

A new network was added which performs a lot of the epsilon greedy maths.

Extra

Concerns:

Though the speed seems comparable, it is possible that this way might be slower.
I see there is this IdentityTransformation applied in the other discrete action head. Why is this done, and should I also do this?
Should some of the calculations be moved to the "distributions.py" folder?

lbeyers commented 7 months ago

I think this is ready to merge - definitely check the docstrings I added to the distribution, it's not the traditional format but I think it may be helpful!

lbeyers commented 6 months ago

Discussed offline: will go in with the IQL PR!

instadeepai / Mava