sovit-123 / fasterrcnn-pytorch-training-pipeline

PyTorch Faster R-CNN Object Detection on Custom Dataset
MIT License
223 stars 75 forks source link

Training fails with Roboflow dataset #84

Closed aymuos15 closed 1 year ago

aymuos15 commented 1 year ago

Link to data: https://universe.roboflow.com/jacob-solawetz/aerial-maritime/dataset/22#

IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

To reproduce:

!python train.py --data data_configs/custom_data.yaml --epochs 5 --model fasterrcnn_resnet50_fpn_v2 --name custom_training --batch 8

!curl -L "https://universe.roboflow.com/ds/vTV0PieE66?key=FwNU4om5rO" > roboflow.zip; unzip roboflow.zip -d custom_data; rm roboflow.zip

%%writefile data_configs/custom_data.yaml

Images and labels direcotry should be relative to train.py

TRAIN_DIR_IMAGES: 'custom_data/train' TRAIN_DIR_LABELS: 'custom_data/train' VALID_DIR_IMAGES: 'custom_data/valid' VALID_DIR_LABELS: 'custom_data/valid'

Class names.

CLASSES: [ 'background', 'docks', 'boats', 'lifts', 'jetskis', 'cars' ]

Number of classes (object classes + 1 for background class in Faster RCNN).

NC: 6

Whether to save the predictions of the validation set while training.

SAVE_VALID_PREDICTION_IMAGES: True

This seems to be the case with multiple datasets. I download them with the 'voc setting' too. Any idea why they fail?

sovit-123 commented 1 year ago

Can you please let me know what error you are facing? Or if it is just the mAP issue, please try passing --mosaic 0 to the training command. Mosaic augmentation can bit too much for some datasets, so the above command will turn it off.

sovit-123 commented 1 year ago

I have pushed an update where mosaic is turned off by default. Please pull the code once. Most probably this will give much better results.

aymuos15 commented 1 year ago

It is just the mAP. (There is no error)

I re-cloned the repo (using colab anyways). It is still showing the same issue.

sovit-123 commented 1 year ago

I will check. I have never faced this issue before.

sovit-123 commented 1 year ago

Can you please check the dataset paths in the YAML file once more?

aymuos15 commented 1 year ago

Checking Labels and images... 100% 102/102 [00:00<00:00, 241705.65it/s] Checking Labels and images... 100% 10/10 [00:00<00:00, 97997.76it/s] Creating data loaders /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Number of training samples: 102 Number of validation samples: 10

They are correct, it is being read.

If you have the time, could you please try running it with the same data? You can just re-implement it with the original details i have pasted on the question.

sovit-123 commented 1 year ago

Pushed another update. Actually, the class names are wrong. They are not plural (do not contain 's' in the end). The datasets.py was not showing the error because of a try-catch block. I corrected it. Now you can run with your current YAML file. It will show the error for the wrong class names in the YAML file. Correct the class names, and you should start getting results.

Let me know if you face more issues.

Please pull the update.

aymuos15 commented 1 year ago

File "/content/fastercnn-pytorch-training-pipeline/datasets.py", line 107, in load_image_and_labels labels.append(self.classes.index(member.find('name').text)) ValueError: 'dock' is not in list

It shows the error, thanks! It is also working and training appropriately now.

Thank you very much for the fast response always :)