Closed schmidtj3 closed 19 minutes ago
Can you point me to the part of the code base that implements the reward functions described in Equations 5, 6, 7, 8 in the paper?
I would like to understand how these equations are translated into code. Thank you!
Eqn 5 and 7 are implemented in L240-L260 of ppo_trainer.py.
Eqn 6 and 8 are implemented in compute_rewards() of ppo_trainer.py.
Thank you very much !
Can you point me to the part of the code base that implements the reward functions described in Equations 5, 6, 7, 8 in the paper?
I would like to understand how these equations are translated into code. Thank you!