mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.24k stars 536 forks source link

Latest OpenImages dataset annotations differ from those generated by previous fiftyone based script #1332

Open G4V opened 1 year ago

G4V commented 1 year ago

For some images, the number of annotations generated using the latest script differ from that generated using the fiftyone python package. A couple of examples attached. New seems to produce a subset? We've also seen a difference in annotations when generated on two separate machines. As yet to compare environment for those two machines.

Anyone else experiencing this?

49cc9e3699b6a53e.jpg.txt 1366cde3b480a15c.jpg.txt

arjunsuresh commented 1 year ago

@G4V Is the end result a poor accuracy number?

I'm not sure if it is related but I had used the new script (never tried with the old one) and then did inference using Nvidia submission code and saw this issue. @nv-ananjappa

The scripts were indeed checked and verified at least for the reference implementation (I think I had also tested this). But since the scripts are a cosmetic change (the dataset remaining the same), if it is causing problem, you can always use the old one from 2.1 submission round right?

pgmpablo157321 commented 1 year ago

@G4V What command are you using to download the dataset? Make sure you use the command: ./openimages_mlperf -d <DOWNLOAD_PATH> leaving the -m argument as None. That argument was only added for testing/developing purposes (if you wanted to test the benchmark with a smaller subset)

G4V commented 1 year ago

Hi @pgmpablo157321,

I'm able to generate the entire dataset; it's just that some of the images have a differing number of detections to the annotations generated by the fiftyone package and is throwing out our accuracy.

In our scripts we're launching it as you describe -

https://github.com/krai/ck-mlperf/blob/master/package/dataset-openimages-for-object-detection/install.sh#L18

The annotations for 1366cde3b480a15c.jpg highlight the problem -

1366cde3b480a15c.jpg.txt

Fiftyone generates four boxes and the new script, only the one. Also running on two different machines, the single boxes differ. All very odd.

Could you check the annotations you're generating for this image?

arjunsuresh commented 1 year ago

Just adding a data point here. This is on an Macbook Pro M1 system using onnxruntime backend over the entire dataset using the reference implementation. We see accuracy as 36.650 where as the official number is 37.57. Not sure whether it is due to being a different system.

DONE (t=82.59s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.367
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.512
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.394
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.421
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.596
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.626
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.083
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.340
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.673
mAP=36.650%
arjunsuresh commented 1 year ago

@G4V If you are using this script for preprocessing, can you please try with threads=1 here ?

G4V commented 1 year ago

Thanks @arjunsuresh. That's the order of reduction in accuracy we're experiencing using the updated script. I'll give threads=1 a try. Are you seeing an improvement with this?

G4V commented 1 year ago

@G4V If you are using this script for preprocessing, can you please try with threads=1 here ?

Ah, the above is the script to generate the pre-processed images. The issue we're seeing is with the step before to generate the annotations that sit alongside the original images. Script here -

https://github.com/mlcommons/inference/blob/master/vision/classification_and_detection/tools/openimages.py

arjunsuresh commented 1 year ago

You're welcome Gavin. But please ignore my suggestion of threads=1 - I see that threads is not used in the function anyway - it is just there to keep compatibility with the imagenet script. Openimages preprocessing is serial whereas imagenet preprocessing is done in parallel using the reference script. It takes around 8 hours for the full accuracy run on M1 -- so I cannot try things easily too.

arjunsuresh commented 1 year ago

@G4V Thank you for clarifying. If it is not a big concern please use the old script for your submissions. This doesn't look like an easy fix (all accuracy issues usually takes time)

@pgmpablo157321 Just to be sure, when the scripts were updated did we check it on the entire dataset for accuracy? We have been testing retinanet a lot ourselves but all were using a reduced dataset (which was one of the options which came with the modification).

arjunsuresh commented 1 year ago

@G4V But this can potentially help - using num_processes=1 here. If you are having a fast GPU this can be a quick test.

https://github.com/mlcommons/inference/blob/master/vision/classification_and_detection/tools/openimages.py#L88

pgmpablo157321 commented 1 year ago

@pgmpablo157321 Just to be sure, when the scripts were updated did we check it on the entire dataset for accuracy? We have been testing retinanet a lot ourselves but all were using a reduced dataset (which was one of the options which came with the modification).

@arjunsuresh Yes, that is correct, I tested the reference implementation and it had the same accuracy. I'll run the benchmark accuracy again just to be sure.

The annotations for 1366cde3b480a15c.jpg highlight the problem

1366cde3b480a15c.jpg.txt

Fiftyone generates four boxes and the new script, only the one. Also running on two different machines, the single boxes differ. All very odd.

Could you check the annotations you're generating for this image?

@G4V This is what I get using the current (3.0) script:

Image info:

{'id': 6479, 'file_name': '1366cde3b480a15c.jpg', 'height': 4320, 'width': 2432, 'license': None, 'coco_url': None}, {'id': 6480, 'file_name': '13690841e89135f7.jpg', 'height': 1024, 'width': 925, 'license': None, 'coco_url': None}

Boxes info:

{'id': 13704, 'image_id': 6479, 'category_id': 117, 'bbox': [1268.7263436800001, 260.2409688, 883.16528384, 3580.9157160000004], 'area': 3162540.444728257, 'iscrowd': 0, 'IsOccluded': 0, 'IsInside': 0, 'IsDepiction': 1, 'IsTruncated': 0, 'IsGroupOf': 1}
{'id': 25159, 'image_id': 6479, 'category_id': 148, 'bbox': [978.73172096, 249.83132400000002, 1150.0920704, 3632.96394], 'area': 4178243.0194431418, 'iscrowd': 0, 'IsOccluded': 1, 'IsInside': 0, 'IsDepiction': 0, 'IsTruncated': 0, 'IsGroupOf': 0}
{'id': 41129, 'image_id': 6479, 'category_id': 125, 'bbox': [1384.0650624000002, 1020.1445424, 207.60962559999984, 853.5904416000001], 'area': 177213.59199631453, 'iscrowd': 0, 'IsOccluded': 1, 'IsInside': 0, 'IsDepiction': 0, 'IsTruncated': 0, 'IsGroupOf': 0}
{'id': 41130, 'image_id': 6479, 'category_id': 125, 'bbox': [1430.2005887999999, 2727.325296, 177.9511424000001, 905.6384928000002], 'area': 161159.4043951743, 'iscrowd': 0, 'IsOccluded': 1, 'IsInside': 0, 'IsDepiction': 0, 'IsTruncated': 0, 'IsGroupOf': 0}

I get 4 different boxes with the 3.0 script. It assigns the image to the id 6479 and there are for boxes that belong to this image_id

arjunsuresh commented 1 year ago

@pgmpablo157321 Thank you for confirming. Is it that the box ids for an image are adjacent for the old script but they are not necessarily so for the new one? I'm also seeing 4 boxes for image_id=6479 for the current script.

pgmpablo157321 commented 1 year ago

yes, I see that the 4 boxes are not adjacent

G4V commented 1 year ago

yes, I see that the 4 boxes are not adjacent

Ah, ok, that's what threw me. I'll need to dig a bit further into why we're seeing the difference in accuracy measurement.

@arjunsuresh any ideas from your side on this?

arjunsuresh commented 1 year ago

Nothing clicking as of now -- need a sleep :) But since @pgmpablo157321 confirmed that he got the expected accuracy and I'm getting lower accuracy on aarch64 (I'll try a run on x86 overnight) using the same reference implementation - we can conclude that the issue has nothing related to any internal preprocessing you might be using. It could be architecture difference (less likely), or some python dependency version change. If this was for resnet50 I could have tried all the possibilities easily due to the short runtime. Here, I'll try if we can replicate the issue on a small dataset size (6-7 hours for a single run is not feasible) and if so in a day or two I should be able to report the culprit.

Also, sorting the annotations based on image_id might be a solution right?

G4V commented 1 year ago

Thanks @arjunsuresh. The only difference for us between accuracy calcs is the annotations file (I think). Shall dig further. Sorting the annotations will give another good data point.

pgmpablo157321 commented 1 year ago

Just ran a couple of tests and I get there is a very small difference between both sets. For some reason, either this implementation or the previous one swaps the dimensions of the image 1366cde3b480a15c.jpg. However this should be negligible for the metric since it only has 4 boxes out of 158642.

Specifically what I did was:

@G4V how did you find that specific image?

G4V commented 1 year ago

@G4V how did you find that specific image?

Luck. I hadn't realised that the boxes for this specific image differed from those produced by the previous script, only that I thought the boxes were a subset as not contiguously listed in the json.

Agree that all other boxes are the same barring the four. The accuracy issue is our end, I think but not yet concluded.

arjunsuresh commented 1 year ago

@G4V you should try a lottery 😁

pgmpablo157321 commented 1 year ago

@G4V @arjunsuresh I got a reduction in accuracy as well:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.366
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.512
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.394
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.420
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.595
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.076
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671
TestScenario.SingleStream qps=45.52, mean=0.1832, time=544.420, acc=40.478%, mAP=36.634%, queries=24781, tiles=50.0:0.1833,80.0:0.1881,90.0:0.1905,95.0:0.1924,99.0:0.1962,99.9:0.2082
arjunsuresh commented 1 year ago

Thanks @pgmpablo157321 So M1 gave a slightly better accuracy 36.650%.

Do you know what exactly has changed since the last time you got 37.57?

psyhtest commented 1 year ago

TL;DR: fiftyone==0.16.5 mlperf-inference-source==2.1 gets things back in shape.

Rather unhelpfully, fiftyone introduced a new 0.19.0 release just a few days ago, which seems to break downloads even with the r2.1 branch. I think 0.18.0 should work too, as we had no download issues until February, but I've only tested 0.16.5 so far.

arjunsuresh commented 1 year ago

Thank you @psyhtest And if we use the annotations file produced and then call this accuracy script, we can expect 37.57% mAP right?

pgmpablo157321 commented 1 year ago

I made another two runs, and these are the results I got. First, I ran the object detection benchmark with Inference 3.0 annotations and 2.1 code and I got:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.366
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.512
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.394
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.420
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.595
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.076
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671
TestScenario.SingleStream qps=43.34, mean=0.1822, time=571.734, acc=40.478%, mAP=36.634%, queries=24781, tiles=50.0:0.1824,80.0:0.1875,90.0:0.1900,95.0:0.1919,99.0:0.1958,99.9:0.2050

Then I ran the benchmark with Inference 2.1 annotations and 3.0 code:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.376
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.524
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.406
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.420
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.596
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.075
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
TestScenario.SingleStream qps=43.32, mean=0.1816, time=572.001, acc=40.478%, mAP=37.550%, queries=24781, tiles=50.0:0.1818,80.0:0.1868,90.0:0.1891,95.0:0.1910,99.0:0.1947,99.9:0.2018

So it seems the four boxes are responsible for the difference in mAP (I don't completely understand how). These issue should be solved for now by taking the annotations from this release

arjunsuresh commented 1 year ago

@pgmpablo157321 That's useful information. Just to be sure, can we manually edit the annotations file from r2.1 - to modify just the 4 boxes like the annotations file of r3.0 and see what accuracy we can get? This can tell us if it is really the boxes or the different ordering that is causing the accuracy difference.

pgmpablo157321 commented 1 year ago

@arjunsuresh I think you can do that, but also keep in mind that the dimensions of the image 1366cde3b480a15c.jpg were swapped as well. So that might also affect the results

G4V commented 1 year ago

@pgmpablo157321 I've run with the known good 2.1 annotations file but with the boxes and dimensions modified for 1366cde3b480a15c.jpg, and I'm not seeing a change in accuracy. Could you try this also and confirm that you see the same?

If so, and everything else being equal, this seems to imply that the accuracy calc is (erroneously) tied to the order of images in the annotations file?

arjunsuresh commented 1 year ago

@G4V I was thinking the same but could not try it as I just got a system. I could not find anything suspicious with the accuracy script it does have this written.

arjunsuresh commented 1 year ago

@pgmpablo157321 In the dataset download script with count option like -m 50, the script is downloading 50 random images. Is there any reason to include this randomness? If not can you please remove this as then we can easily compare the accuracy of smaller dataset runs.

arjunsuresh commented 1 year ago

By replacing the annotations file we are also seeing the expected accuracy. But still not sure of the real reason of the problem.

TestScenario.Offline qps=158.26, mean=11.1613, time=156.587, acc=41.033%, mAP=37.572%, queries=24781, tiles=50.0:10.4530,80.0:14.5974,90.0:14.8929,95.0:15.0823,99.0:15.4117,99.9:24.6361

CM run command used

cm run script --tags=generate-run-cmds --execution-mode=valid --model=retinanet \
--mode=accuracy  --adr.openimages-preprocessed.tags=_full,_custom-annotations
G4V commented 1 year ago

@arjunsuresh is that with the 2.1 annotations file unmolested or with the boxes for the offending image modified?

arjunsuresh commented 1 year ago

This is using the same 2.1 annotations file.

arjunsuresh commented 1 year ago

I tried running using the 3.0 annotations file but after sorting the annotations list based on image_id. No change in accuracy but the below errors are coming.

INFO:coco:loaded 24781 images, cache=0, already_preprocessed=True, took=0.5sec
INFO:main:starting TestScenario.Offline
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=22270 / result=24412
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=24412 / result=22270
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=803 / result=6635
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
ERROR:coco:image_idx missmatch, lg=6635 / result=803
loading annotations into memory...
Done (t=0.24s)
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(2936607, 7)
0/2936607
1000000/2936607
2000000/2936607
DONE (t=8.08s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=105.56s).
Accumulating evaluation results...
DONE (t=25.74s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.367
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.512
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.394
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.421
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.596
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.626
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.083
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.340
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.673
TestScenario.Offline qps=164.17, mean=11.1324, time=150.949, acc=41.033%, mAP=36.650%, queries=24781, tiles=50.0:10.3526,80.0:14.5942,90.0:14.8564,95.0:15.0179,99.0:15.2703,99.9:15.481
pgmpablo157321 commented 1 year ago

Just adding another datapoint here. I ran the benchmark again with the following modifications: 3.0 code with 3.0 sorted annotations:

Evaluate annotation type *bbox*
DONE (t=314.77s).
Accumulating evaluation results...
DONE (t=74.29s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.366
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.512
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.394
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.420
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.595
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.076
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671
TestScenario.SingleStream qps=43.89, mean=0.1833, time=564.633, acc=40.478%, mAP=36.634%, queries=24781, tiles=50.0:0.1834,80.0:0.1883,90.0:0.1906,95.0:0.1925,99.0:0.1963,99.9:0.2034

3.0 code and 2.1 'messed up' annotations:

DONE (t=314.93s).
Accumulating evaluation results...
DONE (t=71.28s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.376
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.524
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.406
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.420
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.596
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.075
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
TestScenario.SingleStream qps=44.68, mean=0.1830, time=554.606, acc=40.478%, mAP=37.550%, queries=24781, tiles=50.0:0.1831,80.0:0.1880,90.0:0.1903,95.0:0.1921,99.0:0.1957,99.9:0.2049

This implies that the order is not what is affecting the change in accuracy. But since both are 'almost' identical, it isn’t clear what is causing this error. Maybe it is some numerical approximation of the bboxes (or some other entries)

arjunsuresh commented 1 year ago

@pgmpablo157321 Thank you for the datapoint. I'm not sure how acc is being calculated but in all the results you shared it is 40.478 whereas in mine it is 41.033 irrespective of the reported mAP.

nvzhihanj commented 1 year ago

@arjunsuresh sorry for the late reply, but yes we are reported this issue from partners 2 weeks ago as well. I was able to reproduce the low accuracy issue with the v3.0 scrambled annotation yesterday (our full accuracy is 37.487)

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.367
mAP=36.703%

We will use the old annotation from v2.1 for v3.0 submission. FYI @nv-ananjappa

pgmpablo157321 commented 1 year ago

I was running a couple of test and wanted to add this datapoint. I seems that there is a numerical error computing the areas. This is an example I get comparing both annotation files

...
Annotations in file c682818fe22eb309.jpg differ: 
['(bbox:(0.0, 62.400000000000006, 1020.8, 648.0) area: 661478.4)', '(bbox:(0.0, 62.400000000000006, 1020.8, 648.0) area: 661478.3999999999)']
...

And got small numerical errors like this in ~11000 images

pgmpablo157321 commented 1 year ago

I was able to reproduce the 2.1 mAP by changing the iscrowd value in the annotations:

creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=339.01s).
Accumulating evaluation results...
DONE (t=81.03s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.376
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.524
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.406
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.420
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.596
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.075
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
TestScenario.SingleStream qps=39.89, mean=0.1221, time=621.218, acc=40.478%, mAP=37.550%, queries=24781, tiles=50.0:0.1228,80.0:0.1281,90.0:0.1305,95.0:0.1324,99.0:0.1360,99.9:0.1410

The changes I did are in this branch

arjunsuresh commented 1 year ago

That's great work @pgmpablo157321. Does the new code work fine for both the new and old annotations?

pgmpablo157321 commented 1 year ago

@arjunsuresh Yes, now it works for both. I think the problem was the iscrowd field did not match the v2.1 annotations

arjunsuresh commented 1 year ago

That's great @pgmpablo157321

nvzhihanj commented 1 year ago

Hi @pgmpablo157321 , sorry for the late reply, but is it possible to sort the new annotation and make it the same as v2.1? The PR doesn't work out-of-the-box because of the skewed image order. Thank you!

psyhtest commented 1 year ago

Reopening to discuss the last comment.

pgmpablo157321 commented 1 year ago

@psyhtest @nvzhihanj I added this line before the PR was merged https://github.com/mlcommons/inference/blob/192f81b3d4e6b61ba48396bba2e7f3919d393e7d/vision/classification_and_detection/tools/openimages.py#L148 So the image order by now is the same. However, I ran some test and there are still some differences in the annotation files. The annotations are sorted by ImageID, but the order of the annotations within an image does not necessarily match. Unfortunately the annotations from the original script do not seem to have a specific order here, so I think it is impossible to match the order.

I can confirm that:

One possible format improvement that I noticed is that the ids of the annotations (not the ImageID) are not sorted. This is only a unique identifier for each annotation and does not affect the metric, but I could modify script so that they are sorted as well

arjunsuresh commented 1 year ago

I can confirm that with the latest master branch code, accuracy is fine for the entire dataset

DONE (t=28.66s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.376
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.525
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.406
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.025
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.127
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.415
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.420
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.598
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.627
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.341
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.677
TestScenario.Offline qps=147.82, mean=11.5008, time=167.640, acc=41.033%, mAP=37.572%, queries=24781, tiles=50.0:10.6210,80.0:15.0662,90.0:15.3765,95.0:15.5956,99.0:15.9346,99.9:16.2546

Run command used

python3 -m pip install cmind
cm pull repo mlcommons@ck
cmr  "run mlperf inference generate-run-cmds _accuracy-only _full"       --submitter="Community"       \
--hw_name=default       --implementation=reference     --model=retinanet       --backend=onnxruntime  \
--device=cpu       --scenario=Offline    --execution_mode=valid --mode=accuracy  --rerun
arjunsuresh commented 1 year ago

@pgmpablo157321 We are still allowing submitters to use any annotations file right?

nv-ananjappa commented 1 year ago

@mrmhodak @mrasquinha-g We should discuss Arjun's proposal in the WGM. https://github.com/mlcommons/inference/issues/1332#issuecomment-1604753762