Closed SiyuanHuang95 closed 1 year ago
Hi there,
Thank you for your congratulatory words. To answer your questions
I noticed that you used torch.distributions.Distribution after MLP to get the final output, could you share some insights about this choice? What's the advantage compared with the direct usage of MLP and softmax?
Theoretically there is no difference between using categorical distribution and MLP + softmax. Personally, I found using torch distributions to be convenient since they implement uniformed interfaces that can work with different strategies to model action heads.
Also, for the training procedure, should we ignore that header, and direct apply NLL loss with the output of MLP, or should we apply the NLL with the probability of that distribution? If also, could you give some simple code snippets to demonstrate the training usage?
Sure, in the discrete case, let's say dist
is a torch.distributions.Categorical
instance predicted by the model, label
is the discretized action, the loss is calculated with torch.nn.functional.cross_entropy
. Since it takes unnormalized logits as inputs, we can just pass dist.logits
(with proper reshape if necessary) into the loss function. For continuous case with unimodal Gaussian or GMM, I'd recommend to checkout these snippets: here and here.
Great thanks for your @yunfanjiang reply and informative hints!
MLP + Softmax case: Okay, I got it. BTW, I noticed that many works use MSE loss to train the policy network, turning the training into the regression problem. Have you ever conducted some experiments to compare them?
Okay, thanks. But I noticed in your work you chose to use discretized ones. So what would be the big different between them?
Great thanks for your @yunfanjiang reply and informative hints!
- MLP + Softmax case: Okay, I got it. BTW, I noticed that many works use MSE loss to train the policy network, turning the training into the regression problem. Have you ever conducted some experiments to compare them?
- Okay, thanks. But I noticed in your work you chose to use discretized ones. So what would be the big different between them?
Thanks for the followup. To answer them
Hope these would be helpful.
Hi, I noticed that you used torch.distributions.Distribution after MLP to get the final output, could you share some insights about this choice? What's the advantage compared with the direct usage of MLP and softmax?
Also, for the training procedure, should we ignore that header, and direct apply NLL loss with the output of MLP, or should we apply the NLL with the probability of that distribution? If also, could you give some simple code snippets to demonstrate the training usage?
BTW, congrats on the acceptance of ICML, well done!
Bests,