In CPU implementation, we iterate over num_estimates x num_gt to calculate the errors for MSSD and MSPD. In GPU implementation, we iterate over num_object x num_gt which makes the run-time faster, particularly when num_estimates is large (e.g with 50K estimates, it is 3x faster). I didn’t modify eval_calc_scores.py.
The current implementation limits GPU usage to 1.0 GB and can output results for 50K detections within 6 minutes. Note that the GPU implementation always runs with 1 worker, as batching in GPU serves the same purpose as multiprocessing for improving run-time.
As usual, I reproduced the scores of MegaPose for 6D localization tasks to make sure the scores do not change.
Here is the benchmarking for run-time in 6D detection task:
Hi @thodan and @MartinSmeyer,
In CPU implementation, we iterate over num_estimates x num_gt to calculate the errors for MSSD and MSPD. In GPU implementation, we iterate over num_object x num_gt which makes the run-time faster, particularly when num_estimates is large (e.g with 50K estimates, it is 3x faster). I didn’t modify eval_calc_scores.py.
The current implementation limits GPU usage to 1.0 GB and can output results for 50K detections within 6 minutes. Note that the GPU implementation always runs with 1 worker, as batching in GPU serves the same purpose as multiprocessing for improving run-time.
As usual, I reproduced the scores of MegaPose for 6D localization tasks to make sure the scores do not change.
Here is the benchmarking for run-time in 6D detection task:
Thanks @MedericFourmy for finding the bugs!