Can't get a good expert after training the RL model 10M steps

Li-Yongcheng commented 1 year ago

Thanks for sharing your excellent work!

I've trained the RL model for two times, it learns well at first, however after 7M steps the agent tends to get stuck at the traffic lights and won't start again when the light turned to green. It seems that the agent acts very conservative at a low speed or moves forward a little bit after a long time.

The checkpoint I got after 10M steps even can't complete a single route due to the problem. I didn't modify the reward code and tried to use the same training parameters used in the paper with batch_size=256, n_steps_total=12288 and 6 towns at the same time. Below is the problem screenshot during training(I used the -quality-level=Low option when starting CARLA to monitor the training process), the green car is the agent.

2023-01-09 21-15-35 的屏幕截图

2023-01-09 22-21-26 的屏幕截图

I find that the total loss begins to grow after 7M steps. 2023-01-10 18-15-10 的屏幕截图

Thanks for any help or suggestion!

zhejz commented 1 year ago

Indeed we have also observed the performance does not improve further, and in some cases even decreases, after 10M training steps. In practice we early stop the RL training at 10M step and chose the best performing checkpoint, which usually occurs between 7~9M steps. RL training is known to be highly sensitive and random, so you may want to try different random seeds. Moreover, I think the regularization, such as the prior, would harm the performance after 10M steps of training. Although not used in the original paper, I would suggest to remove any regularization and let the policy learn freely after 10M steps.

Li-Yongcheng commented 1 year ago

Thanks for your response very much! Your help is really useful for me. Finally, Happy Chinese New Year!

zhejz / carla-roach

Can't get a good expert after training the RL model 10M steps #23