zeliu98 / Group-Free-3D

Group-Free 3D Object Detection via Transformers
MIT License
242 stars 34 forks source link

5-times evaluation #13

Closed Divadi closed 3 years ago

Divadi commented 3 years ago

Hi, thank you for releasing your codebase!

I wanted to ask - SUN-RGBD results seem to be unstable. I was wondering if you trained a single model and evaluated on 5 seeds, or you trained 5 models on different seeds?

Further, did you notice some variations between training runs?

zeliu98 commented 3 years ago

Hi, we trained 5 models on 5 seeds and for each model we evaluated 5 times on 5 seeds. For mAP@0.25, the std of 5 models( averaged score over 5-times evaluation) is about 0.25; for mAP@0.5, the std is about 0.4;

Divadi commented 3 years ago

To double check:

So the # outside of the (...) is the best of 5 models + best evaluation seed of 5 seeds, while inside is average of the 5x5 evaluations?

I apologize for these "meta" questions - I've found SUN-RGB-D can be a bit unstable, and I think the evaluation standard used for this work is good for comparison

xwhjy commented 3 years ago

Hi, we trained 5 models on 5 seeds and for each model we evaluated 5 times on 5 seeds. For mAP@0.25, the std of 5 models( averaged score over 5-times evaluation) is about 0.25; for mAP@0.5, the std is about 0.4;

Hi Ze,

Thanks for the reply! May I ask how did you set the seeds for training? The rng_seed parsed in train_dist.py seems not used in that training script.

And may I also ask why we need set seeds for evaluation? I checked the log file for evaluation and found out that each evaluation does have different results. But I don't quite understand why there could be randomness happening in the evaluation phase as model parameters should be fixed, right? Sorry for my dumb question and looking forward to your reply! Thanks!

Divadi commented 3 years ago

I also wanted to ask - when I try evaluating the model, I'm getting a really long inference estimated time (10 hrs) for single gpu, single batch. When I enable faster_eval, it comes down to around 1 hour, but that still seems to be really high. When I remove the "parse_predictions" part entirely, it finally comes down to around 7 minutes, but I do think that NMS is necessary (I think nms is done in parse_predictions). Is there something wrong with my setup, by any chance?

and @xwhjy , I believe different seeds for evaluation is needed for different samples of 20k points from the 50k saved points.

xiaodongww commented 3 years ago

Hi @Divadi @zeliu98 , I also find the evaluation process is very slow. Have you find out which part slows down the evaluation?10 hours seems too slow.