good detection but poor eval (on the same dataset)

I trained the SSD on custom image dataset (1 object per image and 3 different classes)

I evaluated it on similar (but new) images ---> mAP=0.96 and on realy different images ---> mAP = 0.31

The problem comes when I use detect.py on both eval datasets: both give me rly good visual results (with almost no error)

Here is the 3 different poses i'm able to detect and classify (with top_k=1): poses_detection

This is the evaluation on different cases:

easy dataset, topk = 200 : (AP = 0.9) PR_lying_down_easy_dataset_topk200

easy dataset, topk = 1 : (AP = 0.9) PR_lying_down_easy_dataset_topk1

hard dataset, topk = 200 : (AP = 0.2) PR_lying_down_hard_dataset_topk200

hard dataset, topk = 1 : (AP = 0.2) PR_lying_down_hard_dataset_topk1

And for all these cases, when I visualize images with top k = 1, I have a 0 error (visual AP = 1)

With the debugger, I figured out that the "lying_down" class produces a lot of false positives, which explains the low mAP. So it comes from a bad jaccard intersection (<0.5)... but I don't understand because when I detect.py, predicted bounding boxes are realy accurate, visualy. So, I took only the top_k = 1 object per image in eval.py, but nothing changed : still perfect visual results and still poor mAP. I don't understand why by setting topk=1 in eval.py doesn't solve the false positive detection... Tuning these parameters change nothing...

sgrvinod / a-PyTorch-Tutorial-to-Object-Detection

good detection but poor eval (on the same dataset) #39