performance of policy trained by UrbanDriver notebook

Hi, I was playing with the training notebook for UrbanDriver in the examples folder. I set the number of training iterations to 50K and was using the full training set. I didn't change any other params. When I tested the resulting model (training loss appeared converged, around 0.08) in the corresponding test notebook with visualization, I found that the policy appeared converged to a degenerate solution of staying still. This is a bit surprising since the notebook says that "the sheer size of our dataset ensures that a reasonable performance can be obtained even with this simple loop". Am I missing something? What's the right expectation for the sample training notebook?

woven-planet / l5kit

performance of policy trained by UrbanDriver notebook #371