tinyvision / DAMO-YOLO

DAMO-YOLO: a fast and accurate object detection method with some new techs, including NAS backbones, efficient RepGFPN, ZeroHead, AlignedOTA, and distillation enhancement.
Apache License 2.0
3.79k stars 476 forks source link

[Bug]: Exported end2end ONNX model produces poor COCO validation results #146

Open Fredrik00 opened 5 months ago

Fredrik00 commented 5 months ago

Before Reporting

Search before reporting

OS

Ubuntu 24.04

Device

RTX 4090

CUDA version

12.5

TensorRT version

No response

Python version

3.10

PyTorch version

1.13.1

torchvision version

0.14.1

Describe the bug

I have been attempting to train and export a tinynasL25_s model on the COCO dataset, but getting terrible results from the exported model. I have exported the model end2end, which if I am interpreting the code correctly should have given me at most 100 detections after NMS, but in many cases I am still getting 1000+ detections. Detections do however appear to be filtered by a minimum confidence score of 0.05.

After 60 epochs of training on the COCO dataset I get an evaluation score of: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.365 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.514 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.394 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.208 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.403 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.488 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.320 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.549 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.613 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.424 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.673 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.771

I export this model using: python tools/converter.py -f configs/damoyolo_tinynasL25_S.py -c workdirs/damoyolo_tinynasL25_S/epoch_60_ckpt.pth --batch_size 1 --img_size 640 --end2end --ort

But when evaluating the exported model on COCO I get the following results: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.008 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.017 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.003 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.047

I have debugged the pre-processing and post-processing steps I have added for running the ONNX model and it looks consistent with the demo script. I am using the same COCO validation scripts for YOLOX and RT-DETR models and have no issues with those. Looks to me like something must be wrong with the export script.

To Reproduce

  1. Train model using configs/damoyolo_tinynasL25_S.py
  2. Export model to ONNX using: python tools/converter.py -f configs/damoyolo_tinynasL25_S.py -c workdirs/damoyolo_tinynasL25_S/epoch_60_ckpt.pth --batch_size 1 --img_size 640 --end2end --ort
  3. Evaluate ONNX model on COCO validation set

Hyper-parameters/Configs

No response

Logs

No response

Screenshots

No response

Additional

No response

ksv87 commented 4 months ago

Not use end2end flag