yuantianyuan01 / StreamMapNet

GNU General Public License v3.0
202 stars 17 forks source link

Questions about the nuScenes old-split checkpoint #2

Closed woodfrog closed 1 year ago

woodfrog commented 1 year ago

Hi Tianyuan,

Thanks for the great paper and code! I'm trying to reproduce the results by running the training and comparing the results with your released checkpoints. The results for the new splits are all good, but I got some questions on the old split of nuScenes. Details are below.

For your "NuScenes oldsplit“ 30-epoch checkpoint with 63.4 AP, the config file seems to suggest that there is no streaming modules in this baseline model, is this right? When I used this config file to train a model with the old split, I could only get ~60 final AP. I'm a bit confused about how that baseline model can get 63.4 AP without the temporal streaming. Also, according to Table 4 in your paper, the full model (with streaming) trained with 24 epochs has an mAP of 62.9, which again suggests that the 63.4 AP of the baseline model is not normal. Could you help clarify and point out my potential misunderstanding here? Thank you!

yuantianyuan01 commented 1 year ago

Hi,

Yes, we do not employ streaming modules on NuScenes old split. The checkpoint with 63.4 mAP is exactly trained using this config file. I am not why you only get ~60 mAP. Sometimes it comes with random errors and a simple re-training may help. Besides, it's reasonable that the 30-epoch model outperforms the 24-epoch model since training on NuScenes old split is a simple overfitting problem. The longer you train, the higher mAP you get until it saturates at ~100 epochs.

woodfrog commented 1 year ago

Hi,

Yes, we do not employ streaming modules on NuScenes old split. The checkpoint with 63.4 mAP is exactly trained using this config file. I am not why you only get ~60 mAP. Sometimes it comes with random errors and a simple re-training may help. Besides, it's reasonable that the 30-epoch model outperforms the 24-epoch model since training on NuScenes old split is a simple overfitting problem. The longer you train, the higher mAP you get until it saturates at ~100 epochs.

Thank you for the reply! To make sure that I understand correctly, do you mean that the 62.9 mAP result reported in Table 4 of the paper is also without the streaming modules? If this is the case, the paper is a bit misleading here as you use "StreamMapNet (Ours) " to represent the full method in most other tables and descriptions.

For the performance gap, a potential cause is that I used 4 GPUs instead of 8 for that experiment, maybe the batch size matters here. I will check again to verify. Thanks again!

yuantianyuan01 commented 1 year ago

Thanks for pointing it out. We will clarify it in our next revision. As for the performance gap, the batch size should be the cause.

alfredgu001324 commented 1 year ago

@woodfrog Hi, I am wondering what parts of the training config needs to be modified for reproducing the results, and what command do you use? I am using a single RTX4090, so should I modify the num_gpus in the config file accordingly? Thank you so much and looking forward to your reply.

woodfrog commented 1 year ago

@woodfrog Hi, I am wondering what parts of the training config needs to be modified for reproducing the results, and what command do you use? I am using a single RTX4090, so should I modify the num_gpus in the config file accordingly? Thank you so much and looking forward to your reply.

Yes ,num_gpus is the part to modify in your case; change it to 1 if you are using a single RTX4090. But probably your results would be lower than the authors' results as the global batch size with a single GPU is too small.

alfredgu001324 commented 1 year ago

@woodfrog Thanks for the reply! I followed your instructions and it seems that I have reproduced the results. The final training result is a bit lower than the author's but I think that would be fine for exploration. When I evaluate on the mini dataset, the final mAP_normal shows 0.8678, and the released checkpoint shows 0.89. I am wondering why this is in a different scale from what you and the author are talking about?