motional / nuplan-devkit

The devkit of the nuPlan dataset.
https://www.nuplan.org
Other
673 stars 129 forks source link

Implementation details of the baseline in the leaderboard. #256

Closed jchengai closed 1 year ago

jchengai commented 1 year ago

Hi motional teams, I wonder if you could revel the implementation details of the baseline (UrbanDriver) in the leaderboard, e.g., model architecture, training config, dataset split/augmentation.

patk-motional commented 1 year ago

Hi @jchengai,

We will share that in our documentation in the next release. In the mean time you can find the implementation here

bhyang commented 1 year ago

Hi @patk-motional,

Is the provided model code and configuration the same ones used in the reported UrbanDriver baseline, or are there other changes? I tried training UrbanDriver with ~250K samples and the performance was lower than the IDM policy for closed-loop reactive planning, but I'm not sure if the performance disparity is solely from the dataset size.

If the details aren't available until the next release, is there an ETA for when that might be ready?

Thanks!

patk-motional commented 1 year ago

Hi @bhyang,

Let me connect you with @christopher-motional who implemented and train the baseline model. He is on leave at the moment. I'll get him to reply as soon as he is back next week.

christopher-motional commented 1 year ago

Hi @bhyang, sorry for the delayed response. Yes, the reported baseline was trained using the available model code with close to the same configuration you will find in the available config files. I believe the only deviations were using the AdamW optimizer with a slightly different learning rate from default (I believe 1.25e-5 vs 5e-5) along with the OneCycleLr learning rate scheduler. Data augmentation was an important part of this, but that should be the same as what you see in the training config.

The baseline was trained on the full trainval dataset, subsampled at a rate of 0.1 (around 300K samples I believe). For this baseline, the IDM policy did actually generally slightly outperform the ml model when evaluated in closed loop with reactive agents -- depending on how much of a disparity you're seeing, that is somewhat expected.

bhyang commented 1 year ago

Hi @christopher-motional, thanks for the clarification! I have a few follow-up questions:

Appreciate the help, thanks!

christopher-motional commented 1 year ago
christopher-motional commented 1 year ago

Just as a quick follow up, as I was saying, the values your see reported for the warm-up phase is reflective of the fact our evaluation for this phase was done on a smaller subset of data with reduced number of scenario types. Evaluation for the test-phase will be on a larger amount of data and will not be skewed in this manner.

bhyang commented 1 year ago

@christopher-motional What was the effective batch size used? Also how long did training take approximately (both number of epochs and wall clock time)? Thanks!

christopher-motional commented 1 year ago

The effective batch size was 256 and we trained around 50 epochs taking around 2 days from what I remember. For what it's worth, the baseline really is more of a reference point to get people started and serves as a base comparison point. If you look at how feature extraction/data augmentation is done for this model in the devkit, you should see a number of things that could be done more efficiently, which we encourage competitors to improve to effectively train their models.

rossgreer commented 1 year ago

I see that in the tutorial, 'scenario_filter.limit_total_scenarios=500'.

Thanks in advance, reading through the Issues discussion has been very helpful!

patk-motional commented 1 year ago

Hi @rossgreer,

Answering your questions in the same order: