pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.22k stars 6.95k forks source link

CocoDetection dataset incompatible with Faster R-CNN model Training and mAP calculation #8270

Open anirudh6415 opened 9 months ago

anirudh6415 commented 9 months ago

🐛 Describe the bug

🐛 Bug

I would like to thank you for Object Detection Finetuning tutorial. The CocoDetection dataset appears to be incompatible with the Faster R-CNN model, I have been using transforms-v2-end-to-end-object-detection-segmentation-example for coco detection. The TorchVision Object Detection Finetuning tutorial specifies the format of datasets to be compatible with the Mask R-CNN model: datasets' getitem method should output an image and a target with fields boxes, labels, area, etc. The CocoDetection dataset returns the COCO annotations as the target.

After Training and Evaluation (engine.evaluate) The mAP scrores are always 0 for every epoch.

Dataset:

from torchvision.datasets import CocoDetection ,wrap_dataset_for_transforms_v2

transforms = v2.Compose(
    [
        v2.ToPILImage(),
        v2.Resize(512),
        v2.RandomPhotometricDistort(p=1),
        v2.RandomZoomOut(fill={tv_tensors.Image: (123, 117, 104), "others": 0}),
        v2.RandomIoUCrop(),
        v2.RandomHorizontalFlip(p=1),
        v2.ToTensor(),
        v2.ToDtype(torch.float32, scale=True),
    ]
)
dataset = CocoDetection(root_dir, annotation_file,transforms=transforms)
dataset = wrap_dataset_for_transforms_v2(dataset, target_keys=("boxes", "labels"))

Expected behavior

FasterRcnn model output from evaluation:

45514: {'boxes': tensor([[ 40.1482, 48.6490, 46.0760, 50.4945], [ 56.4980, 98.9506, 59.9611, 99.8110], [ 16.5955, 50.3514, 20.7766, 51.7256], [ 7.7093, 49.7779, 9.9628, 51.1177], [ 23.2416, 115.2277, 27.9833, 116.0603], [ 6.2100, 43.7826, 12.1565, 44.4718], [ 84.3244, 92.1326, 89.6173, 92.8679], [ 27.8029, 111.4202, 33.1342, 112.3421], [ 6.4772, 83.6187, 11.6571, 85.1347], [ 12.0571, 57.7298, 17.3467, 58.6374], [ 52.0026, 100.2936, 55.6111, 101.0397], [ 32.9334, 95.2229, 36.7513, 96.0473], [ 11.9714, 50.6148, 16.0876, 51.3073], [ 36.5298, 99.2084, 40.1270, 100.3034], [ 56.8915, 95.4639, 59.8486, 96.2372], [ 29.7059, 95.3435, 34.5201, 96.1218], [ 83.4291, 96.4723, 89.3993, 97.1828], [ 80.5682, 114.7297, 86.4728, 115.2060], [ 55.3361, 96.1351, 57.5648, 97.3579], [ 87.8969, 120.9048, 91.9940, 122.8857], [ 79.1790, 95.7387, 83.8181, 96.0961], [ 5.2113, 81.8440, 12.3168, 82.5170], [ 11.9503, 9.1723, 15.8027, 10.5436], [ 43.5947, 115.0965, 46.6917, 116.0323], [ 36.3678, 44.5311, 45.5149, 45.0957], [ 64.0280, 91.6801, 70.0944, 92.6666], [ 34.9408, 48.2833, 39.4942, 48.7989], [ 44.6860, 34.5384, 48.7593, 35.4988], [ 8.5666, 52.0507, 9.7412, 53.2962], [ 59.0582, 114.7045, 62.3767, 115.6113], [ 42.6140, 95.4168, 47.2140, 95.9096], [ 51.6593, 116.3869, 54.5132, 117.3841], [ 10.2391, 8.1375, 15.3591, 9.5619], [ 79.1855, 103.1416, 83.1228, 104.2892], [ 11.6779, 115.0183, 15.2959, 115.6937], [ 92.2911, 64.1361, 97.0701, 65.2341], [ 77.5316, 94.2480, 88.9283, 103.3851], [ 20.0655, 29.8961, 25.1227, 31.4927], [ 41.8090, 91.6322, 66.6452, 114.4692], [ 1.7047, 99.8426, 5.1214, 100.9323], [ 21.0609, 30.4280, 25.2658, 31.7394], [ 77.2151, 111.6185, 83.4417, 112.1318], [ 21.3434, 105.4950, 25.2204, 106.5172], [ 0.0000, 100.4384, 44.0517, 127.3968], [ 37.8296, 43.6599, 41.4970, 44.9153], [101.9358, 28.7851, 107.8658, 29.7577], [ 84.6480, 112.4950, 89.6441, 113.4983], [ 32.0187, 48.2305, 34.6299, 49.9225], [ 21.0508, 96.1484, 49.0644, 115.7840], [ 78.7586, 91.0307, 83.5991, 92.0456], [ 7.4563, 99.3216, 11.9333, 100.4296], [ 41.8862, 39.9427, 48.0095, 40.6046], [ 64.2320, 110.5276, 69.4041, 111.1726], [ 48.5087, 35.6968, 51.0966, 36.3254], [ 69.9470, 80.5252, 77.2012, 81.4588], [ 64.5411, 5.3913, 69.1044, 6.1687], [ 14.9313, 118.7279, 18.5300, 119.9807], [ 67.1189, 75.9650, 74.1020, 76.6732], [104.7447, 31.4134, 109.5322, 32.3514], [ 68.2009, 112.8180, 71.6182, 113.7542], [ 77.4721, 33.7746, 80.8393, 34.7606], [ 9.3352, 80.4828, 12.7890, 82.0704], [ 65.1386, 107.5109, 71.5020, 108.5092], [ 0.0000, 95.9446, 24.0685, 127.6092], [ 43.3848, 100.4263, 46.2075, 101.2998], [ 14.2563, 116.5107, 17.7641, 117.3038], [ 75.3176, 28.1107, 79.5012, 29.0837], [ 21.1844, 99.2131, 24.2272, 100.0415], [ 59.2131, 7.5139, 63.0848, 8.2906], [ 0.0000, 48.6410, 43.9125, 57.1178], [ 92.4588, 61.7449, 97.9917, 62.5649], [ 0.0000, 40.3103, 21.4952, 74.7319], [ 26.3296, 18.2271, 30.6895, 19.6554], [ 24.1920, 8.1525, 29.7535, 9.5155], [ 0.0000, 82.6814, 1.6050, 86.1069], [ 69.8429, 91.4722, 74.5880, 92.6022], [ 40.0346, 106.1212, 43.5103, 107.1371], [ 77.5447, 109.2828, 81.9013, 110.5845], [ 68.1803, 44.8517, 73.7433, 45.6382], [ 0.0000, 84.0534, 3.4056, 85.0159], [ 38.7503, 35.8580, 56.7992, 47.4848], [ 50.1666, 31.5740, 53.6334, 32.6867], [ 31.0113, 101.1863, 33.5362, 101.8402], [ 53.7563, 9.7722, 55.8778, 10.9179], [ 51.0325, 13.5929, 54.7412, 14.6228], [ 18.1654, 104.8237, 21.8600, 105.6227], [ 19.5623, 35.9696, 24.1356, 37.0714], [ 69.2776, 28.0173, 88.3530, 39.2125], [ 75.1365, 115.5374, 77.8445, 116.9997], [ 31.0881, 58.5975, 34.6037, 59.4643], [ 1.6351, 80.1350, 6.1082, 81.3295], [ 22.8064, 117.3966, 62.7837, 127.8345], [ 63.6129, 70.9242, 69.0646, 71.9814], [ 3.4624, 87.2172, 8.6216, 88.5247], [ 55.8403, 28.8055, 59.7083, 30.4423], [ 26.2743, 18.7395, 31.5674, 19.9733], [ 26.8567, 117.3554, 32.5164, 117.9245], [ 55.5966, 104.9360, 58.4963, 105.7098], [ 88.1490, 100.0630, 91.5376, 101.2722], [ 61.8169, 10.4709, 64.8416, 11.5875]], device='cuda:0'), 'labels': tensor([13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13], device='cuda:0'), 'scores': tensor([0.3700, 0.3590, 0.3542, 0.3493, 0.3464, 0.3439, 0.3411, 0.3396, 0.3373, 0.3338, 0.3312, 0.3304, 0.3297, 0.3272, 0.3269, 0.3263, 0.3233, 0.3213, 0.3156, 0.3152, 0.3146, 0.3145, 0.3119, 0.3090, 0.3085, 0.3070, 0.3041, 0.3034, 0.3016, 0.3015, 0.3014, 0.3011, 0.3009, 0.2997, 0.2996, 0.2984, 0.2978, 0.2973, 0.2968, 0.2955, 0.2954, 0.2952, 0.2939, 0.2928, 0.2921, 0.2914, 0.2912, 0.2906, 0.2905, 0.2902, 0.2898, 0.2891, 0.2878, 0.2870, 0.2870, 0.2867, 0.2864, 0.2861, 0.2856, 0.2854, 0.2844, 0.2830, 0.2829, 0.2821, 0.2808, 0.2808, 0.2805, 0.2804, 0.2799, 0.2796, 0.2796, 0.2791, 0.2791, 0.2789, 0.2788, 0.2780, 0.2775, 0.2774, 0.2767, 0.2766, 0.2765, 0.2762, 0.2761, 0.2757, 0.2751, 0.2748, 0.2744, 0.2743, 0.2740, 0.2737, 0.2737, 0.2736, 0.2735, 0.2733, 0.2733, 0.2731, 0.2728, 0.2726, 0.2725, 0.2721], device='cuda:0')}}

But the mAP calculation is always:

Test: [ 0/245] eta: 0:00:23 model_time: 0.0417 (0.0417) evaluator_time: 0.0033 (0.0033) time: 0.0973 data: 0.0519 max mem: 3903 Test: [100/245] eta: 0:00:12 model_time: 0.0388 (0.0390) evaluator_time: 0.0016 (0.0018) time: 0.0882 data: 0.0472 max mem: 3903 Test: [200/245] eta: 0:00:03 model_time: 0.0388 (0.0389) evaluator_time: 0.0015 (0.0018) time: 0.0874 data: 0.0466 max mem: 3903 Test: [244/245] eta: 0:00:00 model_time: 0.0388 (0.0388) evaluator_time: 0.0019 (0.0018) time: 0.0880 data: 0.0478 max mem: 3903 Test: Total time: 0:00:21 (0.0879 s / it) Averaged stats: model_time: 0.0388 (0.0388) evaluator_time: 0.0019 (0.0018) Accumulating evaluation results... DONE (t=0.06s). Accumulating evaluation results... DONE (t=0.06s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 IoU metric: segm Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

To Reproduce

Steps to reproduce the behaviour: Follow the steps in Object Detection Finetuning tutorial substituting a dataset with COCO Detection ( torchvision.datasets.CocoDetection ). I get predicted mAP 0 within the coco_eval. IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

Versions

Collecting environment information... PyTorch version: 2.1.2 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU Nvidia driver version: 522.06 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=2300 DeviceID=CPU0 Family=198 L2CacheSize=11776 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2300 Name=12th Gen Intel(R) Core(TM) i7-12700H ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] numpy==1.26.2 [pip3] pytorch-ignite==0.4.13 [pip3] pytorch-lightning==1.9.5 [pip3] torch==2.1.2 [pip3] torchmetrics==0.10.3 [pip3] torchsummary==1.5.1 [pip3] torchvision==0.16.2 [conda] blas 1.0 mkl [conda] mkl 2023.1.0 h6b88ed4_46358 [conda] mkl-service 2.4.0 py311h2bbff1b_1 [conda] mkl_fft 1.3.8 py311h2bbff1b_0 [conda] mkl_random 1.2.4 py311h59b6b97_0 [conda] numpy 1.26.2 py311hdab7c0b_0 [conda] numpy-base 1.26.2 py311hd01c5d8_0 [conda] pytorch 2.1.2 py3.11_cuda11.8_cudnn8_0 pytorch [conda] pytorch-cuda 11.8 h24eeafa_5 pytorch [conda] pytorch-ignite 0.4.13 pypi_0 pypi [conda] pytorch-lightning 1.9.5 pypi_0 pypi [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchmetrics 0.10.3 pypi_0 pypi [conda] torchsummary 1.5.1 pypi_0 pypi [conda] torchvision 0.16.2 pypi_0 pypi

NicolasHug commented 8 months ago

Sorry @anirudh6415 it's difficult to help without having a minimal reproducible example. Would you mind sharing exactly the steps that you've been using (but as minimal as possible)?

anirudh6415 commented 8 months ago

I have a dataset in Coco format that I want to load using CocoDetection from torchvision.datasets and build a loader. However, when I start training with Faster R-CNN, the model does not seem to be finetuned, and my evaluation results mAP are zero.

Note: With the same code, I built my custom dataset, which started working.