Open oguzserbetci opened 5 years ago
So I found the place and marked it with a TODO in code: rl_teacher/reward_models.py:129
rl_teacher/reward_models.py:129
It is a simple regression MLP that maps (state, action) -> reward