ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.22k stars 3.45k forks source link

TRANSFER LEARNING EXAMPLE #106

Closed glenn-jocher closed 4 years ago

glenn-jocher commented 5 years ago

This guide explains how to train your data with YOLOv3 using Transfer Learning. Transfer learning can be a useful way to quickly retrain YOLOv3 on new data without needing to retrain the entire network. We accomplish this by starting from the official YOLOv3 weights, and setting each layer's .requires_grad field to false that we do not want to calculate gradients for and optimize.

Before You Start

  1. Update (Python >= 3.7, PyTorch >= 1.3, etc.) and install requirements.txt dependencies.
  2. Clone repo: git clone https://github.com/ultralytics/yolov3
  3. Download COCO: bash yolov3/data/get_coco2017.sh

Transfer Learning

1. Download pretrained weights from our Google Drive folder that you want to use to transfer learn, and place them in yolov3/weights/.

*2. Update `.cfgfile** (optional). Each YOLO layer has 255 outputs: 85 outputs per anchor [4 box coordinates + 1 object confidence + 80 class confidences], times 3 anchors. If you use fewer classes, reduce filters tofilters=[4 + 1 + n] * 3, wherenis your class count. This modification should be made to the layer preceding each of the 3 YOLO layers. Also modifyclasses=80toclasses=nin each YOLO layer, wheren` is your class count.

screenshot 2019-02-21 at 19 40 01

3. Train.

python3 train.py --data coco1cls.data --cfg yolov3-spp-1cls.cfg --weights weights/yolov3-spp.pt --transfer

Run the above code to transfer learn on COCO, or specify your own data as --data data/custom.data (See https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data).

If you created a custom *.cfg file, specify it as --cfg custom.cfg.

You can observe in the Model Summary (using model_info(model, report='full') in train.py) that only the 3 YOLO layers have their gradients activated now (all other layers are frozen for duration of training):

Screenshot 2019-09-12 at 12 25 22

Reproduce Our Environment

To access an up-to-date working environment (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled), consider a:

glenn-jocher commented 4 years ago

@SHikumo ah very cool, I like your video! I think if you use the pretrained ultralytics68.pt model you will get even better results though, since COCO is trained on 100k images with a few hundred thousand instances of people.

This will output all of the objects however... but let me see, if we add an argument to the NMS function to only output certain classes then you could use ultralytics68.pt for people only. I'll try to add something today for this.

glenn-jocher commented 4 years ago

@SHikumo ok I added a new optional --classes argparser argument for detect.py in https://github.com/ultralytics/yolov3/commit/d92b75aec819a45680fe40e133d8e3c29d0b6a40 so you can use the pretrained weights for inference on select classes only (i.e. people 0 and bus 5 here):

$ python3 detect.py --weights ultralytics68.pt --cfg yolov3-spp.cfg --classes 0 5
Namespace(cfg='cfg/yolov3-spp.cfg', classes=[0, 5], conf_thres=0.3, device='', fourcc='mp4v', half=False, img_size=416, iou_thres=0.5, names='data/coco.names', output='output', source='data/samples', view_img=False, weights='weights/yolov3-spp.weights')
Using CPU

image 1/2 data/samples/bus.jpg: 416x320 4 persons, 1 buss, Done. (0.777s)
image 2/2 data/samples/zidane.jpg: 256x416 2 persons, Done. (0.623s)
Results saved to /Users/glennjocher/PycharmProjects/yolov3/output
Done. (1.570s)
skprot commented 4 years ago

I'm trying to reproduce the transfer learning method from a tutorial, but there is a problem with the argument "--transfer". As I see there is no "--transfer" argument in the code. Should I run it without this argument?

Thanks in advance!

glenn-jocher commented 4 years ago

@skprot yes just run it without --transfer. To train/transfer from an existing model or backbone simply use --weights model, for example:

python3 train.py --weights darknet53.conv.74

or for better results:

python3 train.py --weights ultralytics68.pt

We've eliminated the transfer flag and do not recommend freezing any layers, as this nearly always results in worse performance. We will update the tutorial soon to reflect this.

kairavpatel commented 4 years ago

This guide explains how to train your data with YOLOv3 using Transfer Learning. Transfer learning can be a useful way to quickly retrain YOLOv3 on new data without needing to retrain the entire network. We accomplish this by starting from the official YOLOv3 weights, and setting each layer's .requires_grad field to false that we do not want to calculate gradients for and optimize.

Before You Start

1. Update (Python >= 3.7, PyTorch >= 1.3, etc.) and install [requirements.txt](https://github.com/ultralytics/yolov3/blob/master/requirements.txt) dependencies.

2. Clone repo: `git clone https://github.com/ultralytics/yolov3`

3. Download [COCO](http://cocodataset.org/#home): `bash yolov3/data/get_coco2017.sh`

Transfer Learning

1. Download pretrained weights from our Google Drive folder that you want to use to transfer learn, and place them in yolov3/weights/.

*2. Update `.cfgfile** (optional). Each YOLO layer has 255 outputs: 85 outputs per anchor [4 box coordinates + 1 object confidence + 80 class confidences], times 3 anchors. If you use fewer classes, reduce filters tofilters=[4 + 1 + n] * 3, wherenis your class count. This modification should be made to the layer preceding each of the 3 YOLO layers. Also modifyclasses=80toclasses=nin each YOLO layer, wheren` is your class count.

screenshot 2019-02-21 at 19 40 01

3. Train.

python3 train.py --data coco1cls.data --cfg yolov3-spp-1cls.cfg --weights weights/yolov3-spp.pt --transfer

Run the above code to transfer learn on COCO, or specify your own data as --data data/custom.data (See https://docs.ultralytics.com/yolov5/tutorials/train_custom_data).

If you created a custom *.cfg file, specify it as --cfg custom.cfg.

You can observe in the Model Summary (using model_info(model, report='full') in train.py) that only the 3 YOLO layers have their gradients activated now (all other layers are frozen for duration of training):

Screenshot 2019-09-12 at 12 25 22

Reproduce Our Environment

To access an up-to-date working environment (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled), consider a:

* **GCP** Deep Learning VM with $300 free credit offer: See our [GCP Quickstart Guide](https://docs.ultralytics.com/yolov5/environments/google_cloud_quickstart_tutorial/)

* **Google Colab Notebook** with 12 hours of free GPU time: [Google Colab Notebook](https://colab.research.google.com/drive/1G8T-VFxQkjDe4idzN8F-hbIBqkkkQnxw)

* **Docker Image** from https://hub.docker.com/r/ultralytics/yolov3. See [Docker Quickstart Guide](https://docs.ultralytics.com/yolov5/environments/docker_image_quickstart_tutorial/)

@glenn-jocher , Hii...I am want to see the layers are freezed or not with the help of model_info(model,report='full')..But i am not able to see with this. Could you please help me how to use this function. Further more, I am new into the ML field, so i am confused about the what is the difference between weights files for the same models yolov3_tiny.weights and yolov3_tiny.conv.15

glenn-jocher commented 4 years ago

@kairavpatel the transfer learning tutorial is not recommended. We do not recommend freezing any layers, as this will nearly always result in poorer performance. To train from pretrained weights (recommended) use python3 train.py --weights ...

glenn-jocher commented 4 years ago

@joel5638 sure. You should use utils.utils.create_backbone() on last.pt first to reset it to epoch 0.

glenn-jocher commented 4 years ago

@joel5638 sure, you can do whatever you want! You should experiment with different ways to see what works best for your situation.

glenn-jocher commented 4 years ago

@joel5638 no, you call it once before training to convert your last.pt into a backbone.pt file ready to be used as pretrained weights for future trainings: https://github.com/ultralytics/yolov3/blob/5d42cc1b9a90e26b0b9bffba61fae93f5d1691b9/utils/utils.py#L607-L619

joel5638 commented 4 years ago

@glenn-jocher perfect will do that. Thank u so much

glenn-jocher commented 4 years ago

@joel5638 you should update your repo, image plotting has been updated to show predictions and ground truth jpgs. See https://github.com/ultralytics/yolov3/pull/1114

If objects are not labelled correctly in your ground truth jpg then you have a labelling problem.

glenn-jocher commented 4 years ago

@joel5638 can you paste your test_gt.jpg and test_pred.jpg here?

glenn-jocher commented 4 years ago

@joel5638 ah, it looks like it's working well! Remember there is NMS, so if the person and the face are largely occupying the same region one or the other may be suppressed. You could try it on zidane.jpg to compare, as in that photo the faces and the persons do not occupy similar areas, the way Bush does above.

joel5638 commented 4 years ago

@glenn-jocher perfect. Will try it on zidane.jpg once the training is complete.

glenn-jocher commented 4 years ago

@joel5638 --transfer flag is deprecated, you may have been using an older version of the repo before. Basically you no longer need it. Simply train normally, specifying the --weights you want to start from (but making a backbone from them first!).

Your command will technically work, but it is not recommended, as hyps with schedules like the learning rate will asssume 273/373 epochs are complete, which is not the case. So just create a backbone first, and then use a normal training command:

python train.py --data ... --weights weights/backbone.pt --cfg ...

glenn-jocher commented 4 years ago

@joel5638 just the same as before. For example to create weights/backbone.pt from weights/last.pt https://github.com/ultralytics/yolov3/issues/106#issuecomment-623173572

from utils.utils import *; create_backbone(f='weights/last.pt')

glenn-jocher commented 4 years ago

@works fine for me. Remember this is a python command, so you run it from a python console. If you are trying to run it from a bash prompt, you need to encapsulate the command in quotes appropriately.

Screen Shot 2020-05-05 at 1 13 53 PM
glenn-jocher commented 4 years ago

@joel5638 from the ubuntu terminal you run the same command, but as python -c "", so this should work:

python -c "from utils.utils import *; create_backbone(f='weights/last.pt')"
glenn-jocher commented 4 years ago

@joel5638 I'm not sure. The command works fine for me. You could try omitting the argument, as it's the default argument anyways. Maybe the single quotes is causing problems.

Screen Shot 2020-05-05 at 1 46 12 PM

EDIT: updated image

joel5638 commented 4 years ago

@glenn-jocher got it. Its with the quotes. Thank you

joel5638 commented 4 years ago

@emmbertelen

try this in the command. this works

python3 -c "from utils.utils import *; create_backbone(f='weights/last.pt')"

glenn-jocher commented 4 years ago

@joel5638 that's odd. I scanned my screen with iDetection and all people are picked up fine. Maybe your dataset is too small, or if you are using tiny you should switch to the default yolov3-spp.cfg with the default pretrained weights. Are you training with all default settings? You should also post your results.png.

IMG_A52038F68DA2-1

glenn-jocher commented 4 years ago

@joel5638 looks fine. If you want to add a class, like face, be aware you need to train all the existing classes plus your new class. If you're already doing this, then you may just need a larger dataset or longer training.

renosatyaadrian commented 4 years ago

Hi @glenn-jocher, I want to ask you about the transfer learning. Is it possible to train a new dataset to increase the prediction of a model? The model I am using here is YOLOv3. In this case for an example of increasing the prediction of the motorbike class by adding a dataset that consists of a few motorbikes images.

I have tried training with new 44 images (34 training - 10 validation), but the result for detection in total was decreased. Is there anything wrong with my training step or my dataset? I'm using this code !python train.py --data data/coco16.data --cfg cfg/yolov3.cfg --weight weights/yolov3.pt --epoch 300 --batch-size 8

This one is the result of using yolov3.weights. There is 47 total detection, Pict1

and this is the result of using yolov3-transf.weights (after trained). This result has slightly decreased to 37 in total, Pict2

glenn-jocher commented 4 years ago

@renosatyaadrian I very seriously doubt that you would expect to improve upon a model trained on perhaps thousands of images of motorbikes by training it on 34 images and then expecting it to generalize better.

If you come back with a dataset of 3400 images then perhaps.

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

AymenKermiche commented 4 years ago

@glenn-jocher i want to apply the transfer learning fine-tuning(add 2 or 3 layers on YOLOv3 architecture and train it on my costum dataset) on darknet . Any help Bro and thanks a lot

RadoslawDebosz commented 4 years ago

Hi, I have a problem with test results. I've trained yolo v3 -spp with default settings from pretrained weights on coco datset (transfer learning). My dataset: 1000 images of city traffic from view of camera's about 12k objects (cars, persons, trucks, buses) train /valid split 0.8 / 0.2

and I freezed only last layers.

The results of training after 100 epochs : obraz

And Then I runned test.py on validation set and I've got test results: obraz

And on the training set: obraz

I have 2 questions :

  1. Do you know why the test results are other (worse) than train results on the charts (Preciosion, Recall, F1, mAP)?
  2. I dont know why e.g. the recall on the trainig chart start from value ~0.7 and then decrease near to 0 after first epochs and then increase during learning again to the ~0.7. It looks like the model ignore the pretrained data and learn from the scratch.
dbkalaria commented 3 years ago

@kairavpatel the transfer learning tutorial is not recommended. We do not recommend freezing any layers, as this will nearly always result in poorer performance. To train from pretrained weights (recommended) use python3 train.py --weights ...

@glenn-jocher I'm bit confused between transfer learning and training on pretrained weights. Means isn't both are the same like in transfer learning we are using pretrained model weights and freeze the layers weights and fine tune the model. So in training from pretrained model are you just using the weights without freezing any layers and update the weights as training progresses.

glenn-jocher commented 3 years ago

@dhruv312 its just wording.

Basically freezing will always lead to worse results on large datasets.

Utsabab commented 7 months ago

@kairavpatel the transfer learning tutorial is not recommended. We do not recommend freezing any layers, as this will nearly always result in poorer performance. To train from pretrained weights (recommended) use python3 train.py --weights ...

@glenn-jocher I'm bit confused between transfer learning and training on pretrained weights. Means isn't both are the same like in transfer learning we are using pretrained model weights and freeze the layers weights and fine tune the model. So in training from pretrained model are you just using the weights without freezing any layers and update the weights as training progresses.

@kairavpatel I get how you are confused with this cause I was in the same position. Let me explain:

When we say using pre-trained weights, the assumption is that our new custom data has images with labels that are also available in COCO. There are no new class labels in our new dataset, therefore, we can use pre-trained weights from COCO for our desired class.

Transfer learning is done when we have new class in our custom dataset. The COCO pre-trained model will not be able to detect it. Therefore, we will have to train the model with COCO + custom dataset with updated class labeling (0-79 COCO labels, 80-n new labels from custom data). This is transfer learning where pre-trained weights can be utilized to make predictions in custom dataset with new class labels. To speed up the process layer freezing is done for faster training time, however, mAP is reduced almost every time.

Hope this explanation helps!

glenn-jocher commented 7 months ago

@Utsabab Great question! In the context of using pretrained weights, the idea is to leverage what the model has already learned without specifically freezing any layers. This means we update the weights across all layers based on the new data, which usually gives a better performance because the model can adapt more flexibly to the new task.

Transfer learning, as often discussed, involves modifying or extending the existing model architecture to better fit new data, which can include freezing certain layers to not update during training. Essentially, using pretrained weights and not freezing layers allows the whole model to adjust and learn from the new data, while freezing layers in transfer learning is more about fine-tuning or adapting the model to new, possibly related tasks.

So, when training with pretrained weights the command is simple:

python3 train.py --weights yolov3.pt --data yourdata.yaml

And there's no need to explicitly freeze layers unless you have a very specific case where you believe it's necessary. 🤓 Hope that clarifies things!