luchris429 / purejaxrl

Really Fast End-to-End Jax RL Implementations
Apache License 2.0
738 stars 62 forks source link

Question about critic loss #11

Closed qlan3 closed 1 year ago

qlan3 commented 1 year ago

I notice that the implemented critic loss (https://github.com/luchris429/purejaxrl/blob/main/purejaxrl/ppo.py#L179) in PPO is quite different from traditional TD error, more like PPO's actor loss style. Could you please point me to any reference? If there is no such reference, are there any reasons behind for doing so?

luchris429 commented 1 year ago

Hello! Good question. The code is inspired from CleanRL's implementation, which itself comes from OpenAI's original implementation.

Costa Huang (author of CleanRL) did an amazing write-up about implementation details here -- In Point 9 of the first section he brings up value function loss clipping! Notably, works investigating it find that it does not help performance, and sometimes can even harm performance. However, I include it for the same reasons that Costa does.

qlan3 commented 1 year ago

Thank you for your quick and helpful reply!