This environment will maintain 4 goals: turn-left, turn-right, go-straight, u-turn and report the observation, termination, reward for each of these goals into the info dict. So a policy can be trained to conduct any of this four goals.
TODO:
[ ] Introduce an example script
[ ] Add unit test
Checklist
[x] I have merged the latest main branch into current branch.
[x] I have run bash scripts/format.sh before merging.
What changes do you make in this PR?
This environment will maintain 4 goals: turn-left, turn-right, go-straight, u-turn and report the observation, termination, reward for each of these goals into the info dict. So a policy can be trained to conduct any of this four goals.
TODO:
Checklist
bash scripts/format.sh
before merging.