Closed Divadi closed 3 years ago
Hi, we trained 5 models on 5 seeds and for each model we evaluated 5 times on 5 seeds. For mAP@0.25, the std of 5 models( averaged score over 5-times evaluation) is about 0.25; for mAP@0.5, the std is about 0.4;
To double check:
So the # outside of the (...) is the best of 5 models + best evaluation seed of 5 seeds, while inside is average of the 5x5 evaluations?
I apologize for these "meta" questions - I've found SUN-RGB-D can be a bit unstable, and I think the evaluation standard used for this work is good for comparison
Hi, we trained 5 models on 5 seeds and for each model we evaluated 5 times on 5 seeds. For mAP@0.25, the std of 5 models( averaged score over 5-times evaluation) is about 0.25; for mAP@0.5, the std is about 0.4;
Hi Ze,
Thanks for the reply! May I ask how did you set the seeds for training? The rng_seed
parsed in train_dist.py
seems not used in that training script.
And may I also ask why we need set seeds for evaluation? I checked the log file for evaluation and found out that each evaluation does have different results. But I don't quite understand why there could be randomness happening in the evaluation phase as model parameters should be fixed, right? Sorry for my dumb question and looking forward to your reply! Thanks!
I also wanted to ask - when I try evaluating the model, I'm getting a really long inference estimated time (10 hrs) for single gpu, single batch. When I enable faster_eval, it comes down to around 1 hour, but that still seems to be really high. When I remove the "parse_predictions" part entirely, it finally comes down to around 7 minutes, but I do think that NMS is necessary (I think nms is done in parse_predictions). Is there something wrong with my setup, by any chance?
and @xwhjy , I believe different seeds for evaluation is needed for different samples of 20k points from the 50k saved points.
Hi @Divadi @zeliu98 , I also find the evaluation process is very slow. Have you find out which part slows down the evaluation?10 hours seems too slow.
Hi, thank you for releasing your codebase!
I wanted to ask - SUN-RGBD results seem to be unstable. I was wondering if you trained a single model and evaluated on 5 seeds, or you trained 5 models on different seeds?
Further, did you notice some variations between training runs?