Closed donotbelieveit closed 1 year ago
@donotbelieveit a good question, you can try to override this function with DiagGaussianDistribution as your dist_fn
how do I get parameters "mean_actions" and "log_std" that DiagGaussianDistribution need?is there any suggestions
how do I get parameters "mean_actions" and "log_std" that DiagGaussianDistribution need?is there any suggestions
the mean_actions
and log_std
should be the outputs of your policy network.
thank you for replying me!I am a second year student in UCAS,looking forward to having more communication with you about the use of the framework.
Hi, I am training my multi-agent environment (continuous action, continuous observation) with PSRO+PPO in branch "policy-support-baseline", see the paper Emergent Complexity via Multi-agent Competition for the specific environment. I found that in Policy's probability distribution, if the action space is continuous, a DiagGaussianDistribution is returned. see distribution.py ,line 876. But here on line 125 of malib/rl/pg/policy.py the default here is to use CategoricalDistribution (because only proba_distribution of the CategoricalDistribution class has the action_mask parameter),see the figure below. How do I use DiagGaussianDistribution? Can the logits obtained in the figure, line 122, help me to use it? I am looking forward to read your answer, thank you!