Open jwyang opened 7 years ago
The mean and weighted mean numbers should be much closer than your results - the only difference is correction for class imbalance. Are you still having an issue with this?
Hi, Peter,
I checked the evaluation code again. The mean AP is computed by averaging over all 1600 entries in app, that is:
print('Mean AP = {:.4f}'.format(np.mean(aps)))
and the weighted mean AP is computed via:
print('Weighted Mean AP = {:.4f}'.format(np.average(aps, weights=weights)))
Since there are merely a part of 1600 categories that appear in the test set (231 from my running), aps will have many zeros in it. In this case, mean(aps) should be low with no doubt.
I guess you reported the mean AP by ruling out all categories with npos = 0, and then get the average on those non-zero entries? When I did like this, I got 10.11%. It is very close to your reported numbers.
Hi, Peter, I am running the training scripts myself (with fewer gpus). What's number of the final training loss at iteration 380K when you trained the model? . If possible, could you please draw a training curve or provide the training log file? Thanks a lot!
Hi @jwyang,
Sorry I haven't responded sooner. We did not exclude zeros in our calculation. It seems like there is some difference in the validation set that is being used, because our 5000 image validation set resulted in no categories with npos = 0
in our evaluation.
Maybe something went wrong with the dataset preprocessing? To help compare I've added the eval.log file from our evaluation to the repo. If it helps I can also add our preprocessed data/cache/vg_1600-400-20_val_gt_roidb.pkl
file.
Hi, @peteanderson80 ,
thanks a lot for your replying, and sharing the log file. Yeah, it is very weird to me. I compared the 5000 validation images, they are the same. I will re-pull your code and re-generate the xml files to see whether I can get the same number as yours. I will let you know when I get the results.
thanks again for your help!
Hi @yuzcccc,
I don't have the original log file, but I've added an example log file from training with a single gpu for 16K iterations, which should give some indication of the expected training loss. From memory I think the final training loss was around 4.0 (compared to about 4.8 in the example log file at iteration 16300).
Thanks @jwyang for investigating. I have shared our pickled datasets so you can see if you get the same:
@jwyang So what makes your accuracy lower than the reported one? I used maskrcnn-benchmark code to train/test the same splits, only got 2.24% mAP ( IoU 0.5).
Hi, I just run the test code using your trained resnet101 model on the test set. I got the following numbers on object detection task:
Mean AP = 0.0146 Weighted Mean AP = 0.1799 Mean Detection Threshold = 0.328
The mean AP (1.46%) is far from the number (10.2%) you reported in the table at the bottom of readme. The weighted mean AP is a bit higher than the number you reported. I am wondering whether there is a typo in your table.
thanks!