mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.18k stars 518 forks source link

pycocotools for retinanet #1240

Open psyhtest opened 1 year ago

psyhtest commented 1 year ago

We've been using the standard pycocotools Python package for calculating the Object Detection accuracy since MLPerf Inference v0.5. It used to be OK for SSD-ResNet34 and SSD-MobileNet-v1, but it is rather painful for RetinaNet. First, this calculation is slow: it takes ~7-8 minutes on a decent workstation per scenario per system; in other words, it's ~15-25 minutes per system. Second, this calculation is memory hungry: see below this calculation strangling an Edge appliance with 8G RAM and 4G swap.

cocotools

arjunsuresh commented 1 year ago

Hi @psyhtest , are you referring to this script?

psyhtest commented 1 year ago

Yes, Arjun.

arjunsuresh commented 1 year ago

Thank you Anton for your reply. On my laptop while doing the accuracy run for 5000 images, the speed of accuracy script is 60 images per second (faster than the workstation?) which is almost 60 times faster than the CPU inference speed and so hardly noticeable. I tried to call the cocoEval library using multiple threads as the description given here shows that the images can be processed in parallel and then we can call the accumulate function. But when we split and process the images list, the calculated scores are changing. So the only option to speed up the processing looks like to do parallel processing inside the evaluation function which is done in this Nvidia fork.

rnaidu02 commented 1 year ago

@pgmpablo157321 To look at Arjun's proposal and give feedback on the feasibility.

arjunsuresh commented 1 year ago

@pgmpablo157321 To use the Nvidia fork of pycocotools we need to add instructions for using this fork and also update the accuracy numbers - there can be a slight difference here. We can give you an update on these by next week as we'll be checking them.

arjunsuresh commented 1 year ago

Unfortunately the Nvidia fork is not working well with retinanet. This commit fixes the issue with PythonAPI but the C++ extension is giving poor accuracy.

rnaidu02 commented 1 year ago

@nv-ananjappa

arjunsuresh commented 1 year ago

This is the patch we used on the inference repo when running using nvidia-pycocotools.

nv-ananjappa commented 1 year ago

@arjunsuresh We are using the (slow) script for MLPerf Inference too. 😁 Since you seem to be familiar with it, would you like to contribute by adding support for the faster NVIDIA cocoapi?

arjunsuresh commented 1 year ago

Thank you @nv-ananjappa for checking. Unfortunately I'm not familiar with cocoapi to do that change :innocent: I had tried to parallelize the python API -- but realized that the original implementation is inherently sequential and that is why Nvidia fork with cpp extension made sense. I'll add my accuracy result as an issue in the Nvidia fork - it might be an easy fix for the original developer.

Meanwhile we are waiting about an hour for the accuracy run of retinanet on Nvidia T4 GPU (using reference implementation) and so 6-7 extra minutes is hardly noticeable :smile:

arjunsuresh commented 8 months ago

@nv-ananjappa This is done now. This patch enables nvidia-pycocotools for openimages accuracy run and speeds up the accuracy check from 7.5 minutes to 2 minutes.

psyhtest commented 8 months ago

@arjunsuresh That's great! How about memory consumption?

arjunsuresh commented 8 months ago

Hi @psyhtest It was about 0.5% on an 768GB system. The original pycocotools had gone upto 1.6% of memory.

arjunsuresh commented 8 months ago

@psyhtest Unfortunately even with the new change, the accuracy run fails on Thundercomm RB6 - 8 GB RAM and 4GB swap space. It runs fine on an Intel Sapphire Rapids in 46s with 256 GB RAM (only about 1% getting used).