sovit-123 / fasterrcnn-pytorch-training-pipeline

PyTorch Faster R-CNN Object Detection on Custom Dataset
MIT License
218 stars 71 forks source link

Training on a custom dataset - starting from COCO pre-trained weights? #88

Open utility-aagrawal opened 1 year ago

utility-aagrawal commented 1 year ago

Hi,

I want to train a fasterrcnn_resnet50_fpn_v2 model on a custom dataset. I want to start from COCO pre-trained weights. Is that the default behavior? or Do I need to supply a weights file thru --weights argument? If yes, where can I find that file?

Thank you for your help!

sovit-123 commented 1 year ago

Hello @utility-aagrawal It will load the COCO pretrained weights by default. You only need to provide --weights if you want to continue from one of your checkpoints.

utility-aagrawal commented 1 year ago

Thanks for the quick response, @sovit-123! I have another question on the GPU memory usage. I get a CUDA out of memory error for a small batch size of 8. I have a Tesla T4 GPU with ~16GB memory. On the same machine, I am able to train a YOLOv8 (43M parameters) with a batch size of 16 but I can only do a batch size of 4 for this repository. I am using the same dataset and same input image size for both of these models. Do you know where this difference is coming from? Do you have any recommendations to speed up the training? My dataset has 15k images and with default image size 640, it's taking almost an hour for an epoch.

Batch size 8 - image

Batch size 4 - image

I really appreciate your help with this!

sovit-123 commented 1 year ago

Hi. Try using --imgsz 640 along with square resizing and AMP (Automatic Mixed Precision). Along with your command here are the additional arguments. Using AMP you can give double the batch size at most times. python train.py --imgsz 640 --square-training --amp --batch 8

One more reason for longer training time can be the default fasterrcnn_resnet50_fpn_v2 model. This V2 is a better model compared to V1 but has a heavier FPN network. Works very well with small objects. In case you are okay with slightly worse results but faster training try using --model fasterrcnn_resnet50_fpn

Can you please let me know how long one epoch takes with YOLOv8? Will help me optimize the repository even more.

utility-aagrawal commented 1 year ago

With YOLOv8-large, it took around 20 mins for one epoch with a batch size of 16. I want to compare my YOLO model with a faster RCNN. I didn't use a square training for my YOLO model so I don't want to use it for faster RCNN but using --amp, I was able to start the training with a batch size 8. It's still in progress. I'll let you know how that goes. Thanks for your help!

sovit-123 commented 1 year ago

Sure. Thanks.

utility-aagrawal commented 1 year ago

Hi, Just wanted to update you on the training time - it still seems pretty slow. It takes ~52 mins to complete an epoch.

sovit-123 commented 1 year ago

Hmm... That can be because of the fasterrcnn_resnet50_fpn_v2 model. Did you try with fasterrcnn_resnet50_fpn model?

utility-aagrawal commented 1 year ago

I haven't tried fasterrcnn_resnet50_fpn model yet because I wanted to compare the best faster RCNN model with my YOLOv8 model. Unfortunately, training is too slow. It took a week to train the v2 model on ~15k images with --amp, --batch 8 and --imgsz 640 (without --square-training) on a 16G Tesla GPU. I was able to train a YOLOv8 (43M parameters) on the same machine using the same dataset and image size but with a batch size 16 in less than 48 hours. Let me know if you find a way to reduce the training time. For now, I'll be using the YOLO implementation. As for the performance, there are way too many false positives as compared to my YOLO model. Thanks for your help!

emanuelevivoli commented 6 months ago

Can it be that YOLO speeds it up thanks to the dataloader? They probably pre-load images and annotations ... are you monitoring I/O operations in the two training settings (YOLOv8 vs Faster R-CNN)?

sovit-123 commented 6 months ago

Faster RCNN is certainly slow to train compared to YOLO. However, it is not because of the data loader. Instead its because of the two stage nature of Faster RCNN.