I think the mAP calculation on A2D-Sentences in this repo has an issue.
When evaluating the model on A2D-Sentences, the five (that is num_queries) predictions per frame are saved to calculate mAP (55.0 mAP on Video Swin-B). But saving all predictions is unreasonable as only the best-score prediction is the referring object mask. I save the best-score prediction that only has ~51.9 mAP.
Hi,
I think the mAP calculation on A2D-Sentences in this repo has an issue.
When evaluating the model on A2D-Sentences, the five (that is num_queries) predictions per frame are saved to calculate mAP (55.0 mAP on Video Swin-B). But saving all predictions is unreasonable as only the best-score prediction is the referring object mask. I save the best-score prediction that only has ~51.9 mAP.