nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.66k stars 343 forks source link

The reward function for training? #45

Closed DongXingshuai closed 3 years ago

DongXingshuai commented 3 years ago

Hi, thanks for your open-source code. Can you tell me which reward function that is used for training? Thanks again.

nikhilbarhate99 commented 3 years ago

Hi, I do not know how the rewards are calculated exactly, for that you will have to go through the code of roboschool env (for example: link).

Although rewards are important for RL, most of the RL research is based on standard benchmarks and abstract away the reward function as an black box function that needs to be optimized.