toshikwa / gail-airl-ppo.pytorch

PyTorch implementation of GAIL and AIRL based on PPO.
MIT License
189 stars 30 forks source link

About disc's output #8

Closed nicholas0717 closed 1 year ago

nicholas0717 commented 1 year ago

Hi @toshikwa

I'm puzzled with your annotation in update_disc(). You said output of discriminator is (-inf, inf), not [0, 1]. Should the output of disc be (-1, 1) when hidden_activation of the disc is nn.Tanh(). I don't know whether I understand it incorrectly.

Looking forward to your reply. Thanks!

toshikwa commented 1 year ago

Hi @nicholas0717

Output is (-inf, inf) if you don't specify output_activation. https://github.com/toshikwa/gail-airl-ppo.pytorch/blob/master/gail_airl_ppo/network/utils.py#L15-L16

Does it make sense?

nicholas0717 commented 1 year ago

Thank you! I get it. I didn't read the utils.py carefully.

Appreciate your reply!