Missing coefficient for value loss in PPO implementation

google / brax

Massively parallel rigidbody physics simulation on accelerator hardware.

Apache License 2.0

2.14k stars 234 forks source link

Missing coefficient for value loss in PPO implementation #424

Closed LabChameleon closed 4 months ago

LabChameleon commented 7 months ago

Hi,

Is there a reason why you have chosen not to implement a coefficient like vf_coef for weighting the loss of the value function in PPO? As far as I know this is commonly available and can be a useful hyperparameter to tune. Would you be open for a PR adding this?

btaba commented 7 months ago

Hi @dierkes-j , absolutely feel free to open up a PR and add the coefficient!

btaba commented 4 months ago

Closing due to inactivity. FWIW, I ran a sweep with the value loss coef. and didn't find improvement for a PPO trained quadruped joystick policy