yuantianyuan01 / StreamMapNet

GNU General Public License v3.0
202 stars 17 forks source link

About fair comparison: input query amount #11

Closed TonyXuQAQ closed 1 year ago

TonyXuQAQ commented 1 year ago

Thanks for sharing the code of the excellent work. In your paper and code, it seems that you use 100 queries (i.e., $N_q=100$), but MapTR only uses 50 queries for detection. Of course more queries produce better results. Did you conduct experiments with the same query number for comparison?

TonyXuQAQ commented 1 year ago

With the same # queries as MapTR (i.e., 50 queries, 24 epochs), I can only get 49.94 mAP, which is slightly larger than MapTR. Could you please help me with this? Screenshot from 2023-10-25 14-58-02

Here is the list of results with different # queries

NuScenes oldsplit

Method epoch # Queries mAP
MapTR 24 50 48.7
BeMapNet 30 60 59.8
StreamMapNet 24 100 62.9
StreamMapNet 24 50 49.94
yuantianyuan01 commented 1 year ago

It's a very interesting question and thank you for pointing it out. It is worth noting that MapTR uses 50 object queries combined with 20 point queries to form in total 1000 hierarchical input queries during decoding. While our model only uses 100 object queries. It's also reported in their paper that increasing object queries in MapTR model does not improve much performance. Thus I think a fair comparison should be using the most suitable #queries for each model, respectively. Btw I don't think using 50 queries could only get 49.94 mAP for our model. Can you show me how you implemented it?

TonyXuQAQ commented 1 year ago

Thanks for the information. Sorry that I made a mistake. I trained the model with 4 4090 GPUs and did not change num_gpus in the config. So I guess the results should be almost 60 mAP.

But at the same time, I noticed that StreamMapNet has many different settings with MapTR, such as embed_dim (yours is 512 while MapTR is 256); StreamMapNet uses three levels of ResNet features while MapTR uses 1 level, etc. I will do more experiments to examine the effect of these settings in the future. From my perspective, these settings should be kept consistent in comparison to prove the effectiveness of your proposed modules.

Thanks again for your information anyway!