Questions about exploration loss design

zhejz / carla-roach

Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach. ICCV 2021.

Other

274 stars 50 forks source link

The intuition can be found on page 4 of the paper.

If z is related to a collision or running traffic light/sign, we apply pz = B(1, 2.5) on the acceleration to encourage Roach to slow down while the steering is unaffected. In contrast, if the car is blocked we use an acceleration prior B(2.5, 1). For route deviations, a uniform prior B(1, 1) is applied on the steering. Despite being equivalent to maximizing entropy in this case, the exploration loss further encourages exploration on steering angles during the last 10 seconds before the route deviation.

It does not have a rigorous mathematical modeling. And the design of the distribution is quite arbitrary. Just make sure the "direction" is correct, i.e. if you want to encourage deceleration, the distribution should have a mean less than 0.

zhejz / carla-roach

Questions about exploration loss design #39