sjtu-marl / malib

A parallel framework for population-based multi-agent reinforcement learning.
https://malib.io
MIT License
498 stars 60 forks source link

How to use DiagGaussianDistribution? the default is CategoricalDistribution #61

Closed donotbelieveit closed 1 year ago

donotbelieveit commented 1 year ago

Hi, I am training my multi-agent environment (continuous action, continuous observation) with PSRO+PPO in branch "policy-support-baseline", see the paper Emergent Complexity via Multi-agent Competition for the specific environment. I found that in Policy's probability distribution, if the action space is continuous, a DiagGaussianDistribution is returned. see distribution.py ,line 876. But here on line 125 of malib/rl/pg/policy.py the default here is to use CategoricalDistribution (because only proba_distribution of the CategoricalDistribution class has the action_mask parameter),see the figure below. How do I use DiagGaussianDistribution? Can the logits obtained in the figure, line 122, help me to use it? I am looking forward to read your answer, thank you!

截屏2023-03-09 上午11 15 40
KornbergFresnel commented 1 year ago

@donotbelieveit a good question, you can try to override this function with DiagGaussianDistribution as your dist_fn

donotbelieveit commented 1 year ago

how do I get parameters "mean_actions" and "log_std" that DiagGaussianDistribution need?is there any suggestions

KornbergFresnel commented 1 year ago

how do I get parameters "mean_actions" and "log_std" that DiagGaussianDistribution need?is there any suggestions

the mean_actions and log_std should be the outputs of your policy network.

donotbelieveit commented 1 year ago

thank you for replying me!I am a second year student in UCAS,looking forward to having more communication with you about the use of the framework.