Open G4V opened 1 year ago
@G4V Is the end result a poor accuracy number?
I'm not sure if it is related but I had used the new script (never tried with the old one) and then did inference using Nvidia submission code and saw this issue. @nv-ananjappa
The scripts were indeed checked and verified at least for the reference implementation (I think I had also tested this). But since the scripts are a cosmetic change (the dataset remaining the same), if it is causing problem, you can always use the old one from 2.1 submission round right?
@G4V What command are you using to download the dataset? Make sure you use the command: ./openimages_mlperf -d <DOWNLOAD_PATH>
leaving the -m
argument as None
. That argument was only added for testing/developing purposes (if you wanted to test the benchmark with a smaller subset)
Hi @pgmpablo157321,
I'm able to generate the entire dataset; it's just that some of the images have a differing number of detections to the annotations generated by the fiftyone package and is throwing out our accuracy.
In our scripts we're launching it as you describe -
The annotations for 1366cde3b480a15c.jpg highlight the problem -
Fiftyone generates four boxes and the new script, only the one. Also running on two different machines, the single boxes differ. All very odd.
Could you check the annotations you're generating for this image?
Just adding a data point here. This is on an Macbook Pro M1 system using onnxruntime backend over the entire dataset using the reference implementation. We see accuracy as 36.650
where as the official number is 37.57
. Not sure whether it is due to being a different system.
DONE (t=82.59s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.367
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.512
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.394
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.421
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.596
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.626
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.083
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.340
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.673
mAP=36.650%
@G4V If you are using this script for preprocessing, can you please try with threads=1
here ?
Thanks @arjunsuresh. That's the order of reduction in accuracy we're experiencing using the updated script. I'll give threads=1 a try. Are you seeing an improvement with this?
@G4V If you are using this script for preprocessing, can you please try with
threads=1
here ?
Ah, the above is the script to generate the pre-processed images. The issue we're seeing is with the step before to generate the annotations that sit alongside the original images. Script here -
You're welcome Gavin. But please ignore my suggestion of threads=1
- I see that threads is not used in the function anyway - it is just there to keep compatibility with the imagenet script. Openimages preprocessing is serial whereas imagenet preprocessing is done in parallel using the reference script. It takes around 8 hours for the full accuracy run on M1 -- so I cannot try things easily too.
@G4V Thank you for clarifying. If it is not a big concern please use the old script for your submissions. This doesn't look like an easy fix (all accuracy issues usually takes time)
@pgmpablo157321 Just to be sure, when the scripts were updated did we check it on the entire dataset for accuracy? We have been testing retinanet a lot ourselves but all were using a reduced dataset (which was one of the options which came with the modification).
@G4V But this can potentially help - using num_processes=1 here. If you are having a fast GPU this can be a quick test.
@pgmpablo157321 Just to be sure, when the scripts were updated did we check it on the entire dataset for accuracy? We have been testing retinanet a lot ourselves but all were using a reduced dataset (which was one of the options which came with the modification).
@arjunsuresh Yes, that is correct, I tested the reference implementation and it had the same accuracy. I'll run the benchmark accuracy again just to be sure.
The annotations for 1366cde3b480a15c.jpg highlight the problem
Fiftyone generates four boxes and the new script, only the one. Also running on two different machines, the single boxes differ. All very odd.
Could you check the annotations you're generating for this image?
@G4V This is what I get using the current (3.0) script:
Image info:
{'id': 6479, 'file_name': '1366cde3b480a15c.jpg', 'height': 4320, 'width': 2432, 'license': None, 'coco_url': None}, {'id': 6480, 'file_name': '13690841e89135f7.jpg', 'height': 1024, 'width': 925, 'license': None, 'coco_url': None}
Boxes info:
{'id': 13704, 'image_id': 6479, 'category_id': 117, 'bbox': [1268.7263436800001, 260.2409688, 883.16528384, 3580.9157160000004], 'area': 3162540.444728257, 'iscrowd': 0, 'IsOccluded': 0, 'IsInside': 0, 'IsDepiction': 1, 'IsTruncated': 0, 'IsGroupOf': 1}
{'id': 25159, 'image_id': 6479, 'category_id': 148, 'bbox': [978.73172096, 249.83132400000002, 1150.0920704, 3632.96394], 'area': 4178243.0194431418, 'iscrowd': 0, 'IsOccluded': 1, 'IsInside': 0, 'IsDepiction': 0, 'IsTruncated': 0, 'IsGroupOf': 0}
{'id': 41129, 'image_id': 6479, 'category_id': 125, 'bbox': [1384.0650624000002, 1020.1445424, 207.60962559999984, 853.5904416000001], 'area': 177213.59199631453, 'iscrowd': 0, 'IsOccluded': 1, 'IsInside': 0, 'IsDepiction': 0, 'IsTruncated': 0, 'IsGroupOf': 0}
{'id': 41130, 'image_id': 6479, 'category_id': 125, 'bbox': [1430.2005887999999, 2727.325296, 177.9511424000001, 905.6384928000002], 'area': 161159.4043951743, 'iscrowd': 0, 'IsOccluded': 1, 'IsInside': 0, 'IsDepiction': 0, 'IsTruncated': 0, 'IsGroupOf': 0}
I get 4 different boxes with the 3.0 script. It assigns the image to the id 6479 and there are for boxes that belong to this image_id
@pgmpablo157321 Thank you for confirming. Is it that the box ids for an image are adjacent for the old script but they are not necessarily so for the new one? I'm also seeing 4 boxes for image_id=6479 for the current script.
yes, I see that the 4 boxes are not adjacent
yes, I see that the 4 boxes are not adjacent
Ah, ok, that's what threw me. I'll need to dig a bit further into why we're seeing the difference in accuracy measurement.
@arjunsuresh any ideas from your side on this?
Nothing clicking as of now -- need a sleep :) But since @pgmpablo157321 confirmed that he got the expected accuracy and I'm getting lower accuracy on aarch64 (I'll try a run on x86 overnight) using the same reference implementation - we can conclude that the issue has nothing related to any internal preprocessing you might be using. It could be architecture difference (less likely), or some python dependency version change. If this was for resnet50 I could have tried all the possibilities easily due to the short runtime. Here, I'll try if we can replicate the issue on a small dataset size (6-7 hours for a single run is not feasible) and if so in a day or two I should be able to report the culprit.
Also, sorting the annotations based on image_id might be a solution right?
Thanks @arjunsuresh. The only difference for us between accuracy calcs is the annotations file (I think). Shall dig further. Sorting the annotations will give another good data point.
Just ran a couple of tests and I get there is a very small difference between both sets. For some reason, either this implementation or the previous one swaps the dimensions of the image 1366cde3b480a15c.jpg
. However this should be negligible for the metric since it only has 4 boxes out of 158642.
Specifically what I did was:
image_id
(in the current implementation, the previous one was already sorted)@G4V how did you find that specific image?
@G4V how did you find that specific image?
Luck. I hadn't realised that the boxes for this specific image differed from those produced by the previous script, only that I thought the boxes were a subset as not contiguously listed in the json.
Agree that all other boxes are the same barring the four. The accuracy issue is our end, I think but not yet concluded.
@G4V you should try a lottery 😁
@G4V @arjunsuresh I got a reduction in accuracy as well:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.366
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.512
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.394
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.076
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671
TestScenario.SingleStream qps=45.52, mean=0.1832, time=544.420, acc=40.478%, mAP=36.634%, queries=24781, tiles=50.0:0.1833,80.0:0.1881,90.0:0.1905,95.0:0.1924,99.0:0.1962,99.9:0.2082
Thanks @pgmpablo157321 So M1 gave a slightly better accuracy 36.650%.
Do you know what exactly has changed since the last time you got 37.57?
TL;DR: fiftyone==0.16.5 mlperf-inference-source==2.1 gets things back in shape.
Rather unhelpfully, fiftyone introduced a new 0.19.0 release just a few days ago, which seems to break downloads even with the r2.1 branch. I think 0.18.0 should work too, as we had no download issues until February, but I've only tested 0.16.5 so far.
Thank you @psyhtest And if we use the annotations file produced and then call this accuracy script, we can expect 37.57% mAP right?
I made another two runs, and these are the results I got. First, I ran the object detection benchmark with Inference 3.0 annotations and 2.1 code and I got:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.366
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.512
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.394
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.076
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671
TestScenario.SingleStream qps=43.34, mean=0.1822, time=571.734, acc=40.478%, mAP=36.634%, queries=24781, tiles=50.0:0.1824,80.0:0.1875,90.0:0.1900,95.0:0.1919,99.0:0.1958,99.9:0.2050
Then I ran the benchmark with Inference 2.1 annotations and 3.0 code:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.376
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.524
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.406
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.596
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.075
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
TestScenario.SingleStream qps=43.32, mean=0.1816, time=572.001, acc=40.478%, mAP=37.550%, queries=24781, tiles=50.0:0.1818,80.0:0.1868,90.0:0.1891,95.0:0.1910,99.0:0.1947,99.9:0.2018
So it seems the four boxes are responsible for the difference in mAP (I don't completely understand how). These issue should be solved for now by taking the annotations from this release
@pgmpablo157321 That's useful information. Just to be sure, can we manually edit the annotations file from r2.1 - to modify just the 4 boxes like the annotations file of r3.0 and see what accuracy we can get? This can tell us if it is really the boxes or the different ordering that is causing the accuracy difference.
@arjunsuresh I think you can do that, but also keep in mind that the dimensions of the image 1366cde3b480a15c.jpg
were swapped as well. So that might also affect the results
@pgmpablo157321 I've run with the known good 2.1 annotations file but with the boxes and dimensions modified for 1366cde3b480a15c.jpg, and I'm not seeing a change in accuracy. Could you try this also and confirm that you see the same?
If so, and everything else being equal, this seems to imply that the accuracy calc is (erroneously) tied to the order of images in the annotations file?
@G4V I was thinking the same but could not try it as I just got a system. I could not find anything suspicious with the accuracy script it does have this written.
@pgmpablo157321 In the dataset download script with count option like -m 50
, the script is downloading 50 random
images. Is there any reason to include this randomness
? If not can you please remove this as then we can easily compare the accuracy of smaller dataset runs.
By replacing the annotations file we are also seeing the expected accuracy. But still not sure of the real reason of the problem.
TestScenario.Offline qps=158.26, mean=11.1613, time=156.587, acc=41.033%, mAP=37.572%, queries=24781, tiles=50.0:10.4530,80.0:14.5974,90.0:14.8929,95.0:15.0823,99.0:15.4117,99.9:24.6361
CM run command used
cm run script --tags=generate-run-cmds --execution-mode=valid --model=retinanet \
--mode=accuracy --adr.openimages-preprocessed.tags=_full,_custom-annotations
@arjunsuresh is that with the 2.1 annotations file unmolested or with the boxes for the offending image modified?
This is using the same 2.1 annotations file.
I tried running using the 3.0 annotations file but after sorting the annotations list based on image_id. No change in accuracy but the below errors are coming.
INFO:coco:loaded 24781 images, cache=0, already_preprocessed=True, took=0.5sec
INFO:main:starting TestScenario.Offline
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
loading annotations into memory...
Done (t=0.24s)
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(2936607, 7)
0/2936607
1000000/2936607
2000000/2936607
DONE (t=8.08s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=105.56s).
Accumulating evaluation results...
DONE (t=25.74s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.367
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.512
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.394
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.421
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.596
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.626
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.083
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.340
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.673
TestScenario.Offline qps=164.17, mean=11.1324, time=150.949, acc=41.033%, mAP=36.650%, queries=24781, tiles=50.0:10.3526,80.0:14.5942,90.0:14.8564,95.0:15.0179,99.0:15.2703,99.9:15.481
Just adding another datapoint here. I ran the benchmark again with the following modifications: 3.0 code with 3.0 sorted annotations:
Evaluate annotation type *bbox*
DONE (t=314.77s).
Accumulating evaluation results...
DONE (t=74.29s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.366
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.512
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.394
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.076
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671
TestScenario.SingleStream qps=43.89, mean=0.1833, time=564.633, acc=40.478%, mAP=36.634%, queries=24781, tiles=50.0:0.1834,80.0:0.1883,90.0:0.1906,95.0:0.1925,99.0:0.1963,99.9:0.2034
3.0 code and 2.1 'messed up' annotations:
DONE (t=314.93s).
Accumulating evaluation results...
DONE (t=71.28s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.376
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.524
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.406
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.596
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.075
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
TestScenario.SingleStream qps=44.68, mean=0.1830, time=554.606, acc=40.478%, mAP=37.550%, queries=24781, tiles=50.0:0.1831,80.0:0.1880,90.0:0.1903,95.0:0.1921,99.0:0.1957,99.9:0.2049
This implies that the order is not what is affecting the change in accuracy. But since both are 'almost' identical, it isn’t clear what is causing this error. Maybe it is some numerical approximation of the bboxes (or some other entries)
@pgmpablo157321 Thank you for the datapoint. I'm not sure how acc
is being calculated but in all the results you shared it is 40.478
whereas in mine it is 41.033
irrespective of the reported mAP.
@arjunsuresh sorry for the late reply, but yes we are reported this issue from partners 2 weeks ago as well. I was able to reproduce the low accuracy issue with the v3.0 scrambled annotation yesterday (our full accuracy is 37.487)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.367
mAP=36.703%
We will use the old annotation from v2.1 for v3.0 submission. FYI @nv-ananjappa
I was running a couple of test and wanted to add this datapoint. I seems that there is a numerical error computing the areas. This is an example I get comparing both annotation files
...
Annotations in file c682818fe22eb309.jpg differ:
['(bbox:(0.0, 62.400000000000006, 1020.8, 648.0) area: 661478.4)', '(bbox:(0.0, 62.400000000000006, 1020.8, 648.0) area: 661478.3999999999)']
...
And got small numerical errors like this in ~11000 images
I was able to reproduce the 2.1 mAP by changing the iscrowd
value in the annotations:
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=339.01s).
Accumulating evaluation results...
DONE (t=81.03s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.376
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.524
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.406
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.596
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.075
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
TestScenario.SingleStream qps=39.89, mean=0.1221, time=621.218, acc=40.478%, mAP=37.550%, queries=24781, tiles=50.0:0.1228,80.0:0.1281,90.0:0.1305,95.0:0.1324,99.0:0.1360,99.9:0.1410
The changes I did are in this branch
That's great work @pgmpablo157321. Does the new code work fine for both the new and old annotations?
@arjunsuresh Yes, now it works for both. I think the problem was the iscrowd
field did not match the v2.1 annotations
That's great @pgmpablo157321
Hi @pgmpablo157321 , sorry for the late reply, but is it possible to sort the new annotation and make it the same as v2.1? The PR doesn't work out-of-the-box because of the skewed image order. Thank you!
Reopening to discuss the last comment.
@psyhtest @nvzhihanj I added this line before the PR was merged https://github.com/mlcommons/inference/blob/192f81b3d4e6b61ba48396bba2e7f3919d393e7d/vision/classification_and_detection/tools/openimages.py#L148
So the image order by now is the same. However, I ran some test and there are still some differences in the annotation files. The annotations are sorted by ImageID
, but the order of the annotations within an image does not necessarily match. Unfortunately the annotations from the original script do not seem to have a specific order here, so I think it is impossible to match the order.
I can confirm that:
One possible format improvement that I noticed is that the ids of the annotations (not the ImageID
) are not sorted. This is only a unique identifier for each annotation and does not affect the metric, but I could modify script so that they are sorted as well
I can confirm that with the latest master branch code, accuracy is fine for the entire dataset
DONE (t=28.66s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.376
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.525
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.406
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.598
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.627
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.341
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.677
TestScenario.Offline qps=147.82, mean=11.5008, time=167.640, acc=41.033%, mAP=37.572%, queries=24781, tiles=50.0:10.6210,80.0:15.0662,90.0:15.3765,95.0:15.5956,99.0:15.9346,99.9:16.2546
Run command used
python3 -m pip install cmind
cm pull repo mlcommons@ck
cmr "run mlperf inference generate-run-cmds _accuracy-only _full" --submitter="Community" \
--hw_name=default --implementation=reference --model=retinanet --backend=onnxruntime \
--device=cpu --scenario=Offline --execution_mode=valid --mode=accuracy --rerun
@pgmpablo157321 We are still allowing submitters to use any annotations file right?
@mrmhodak @mrasquinha-g We should discuss Arjun's proposal in the WGM. https://github.com/mlcommons/inference/issues/1332#issuecomment-1604753762
For some images, the number of annotations generated using the latest script differ from that generated using the fiftyone python package. A couple of examples attached. New seems to produce a subset? We've also seen a difference in annotations when generated on two separate machines. As yet to compare environment for those two machines.
Anyone else experiencing this?
49cc9e3699b6a53e.jpg.txt 1366cde3b480a15c.jpg.txt