TRANSFER LEARNING EXAMPLE

glenn-jocher commented 5 years ago

This guide explains how to train your data with YOLOv3 using Transfer Learning. Transfer learning can be a useful way to quickly retrain YOLOv3 on new data without needing to retrain the entire network. We accomplish this by starting from the official YOLOv3 weights, and setting each layer's .requires_grad field to false that we do not want to calculate gradients for and optimize.

Before You Start

Update (Python >= 3.7, PyTorch >= 1.3, etc.) and install requirements.txt dependencies.
Clone repo: git clone https://github.com/ultralytics/yolov3
Download COCO: bash yolov3/data/get_coco2017.sh

Transfer Learning

1. Download pretrained weights from our Google Drive folder that you want to use to transfer learn, and place them in yolov3/weights/.

*2. Update `.cfgfile** (optional). Each YOLO layer has 255 outputs: 85 outputs per anchor [4 box coordinates + 1 object confidence + 80 class confidences], times 3 anchors. If you use fewer classes, reduce filters tofilters=[4 + 1 + n] * 3, wherenis your class count. This modification should be made to the layer preceding each of the 3 YOLO layers. Also modifyclasses=80toclasses=nin each YOLO layer, wheren` is your class count.

3. Train.

python3 train.py --data coco1cls.data --cfg yolov3-spp-1cls.cfg --weights weights/yolov3-spp.pt --transfer

Run the above code to transfer learn on COCO, or specify your own data as --data data/custom.data (See https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data).

If you created a custom *.cfg file, specify it as --cfg custom.cfg.

You can observe in the Model Summary (using model_info(model, report='full') in train.py) that only the 3 YOLO layers have their gradients activated now (all other layers are frozen for duration of training):

Reproduce Our Environment

To access an up-to-date working environment (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled), consider a:

GCP Deep Learning VM with $300 free credit offer: See our GCP Quickstart Guide
Google Colab Notebook with 12 hours of free GPU time: Google Colab Notebook
Docker Image from https://hub.docker.com/r/ultralytics/yolov3. See Docker Quickstart Guide

jw-pyo commented 5 years ago

Hi @glenn-jocher , I have a question about this. I want to change the configuration of yolo layers(remove some layer, change the number of filters, etc..) and apply transfer learning. In this case, is it possible to use transfer learning using the official weight? If it's possible, could you give me the way or just a keyword about this?

glenn-jocher commented 5 years ago

@jw-pyo you can do anything you want, but you have to do it, we can't "give you a way". Recommend you visit our tutorials to get started, and the PyTorch tutorials for more general customization questions.

https://docs.ultralytics.com/yolov5/tutorials/train_custom_data https://github.com/ultralytics/yolov3/wiki/Example:-Transfer-Learning https://pytorch.org/tutorials/

hac135 commented 5 years ago

I hava a problem, I want to train some new classes and pictures using transfer learning. but my classes number=7. so if I use darknet53.conv.74 as pretrained model, it doesn't work ! what should I do

jw-pyo commented 5 years ago

@hac135 If you want to use pretrained model as transfer learning but your own model has different shape, what I know is just copying the weights which are same shape with pretrained model, and about layers of different shape, you just manually initialize the corresponding layer.

glenn-jocher commented 5 years ago

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter.

For example, our single class tutorial operates just as well with no modifications to the cfg file: https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class

It's not clean and its not optimal, but it works.

hac135 commented 5 years ago

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter.

For example, our single class tutorial operates just as well with no modifications to the cfg file: https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class

It's not clean and its not optimal, but it works.

Thank you ! it did works!

hac135 commented 5 years ago

@hac135 If you want to use pretrained model as transfer learning but your own model has different shape, what I know is just copying the weights which are same shape with pretrained model, and about layers of different shape, you just manually initialize the corresponding layer.

that's a good suggestion, thanks

glenn-jocher commented 5 years ago

@shahidammer try training from scratch, and observe your training results in results.txt.

glenn-jocher commented 5 years ago

@shahidammer please note that most technical problems are due to:

Your changes to the default repository. If your issue is not reproducible in a fresh git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:

sudo rm -rf yolov3  # remove exising repo
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # git clone latest
python3 detect.py  # verify detection
python3 train.py  # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE

Your custom data. If your issue is not reproducible with COCO data we can not debug it. Visit our Custom Training Tutorial for exact details on how to format your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
Your environment. If your issue is not reproducible in a GCP Quickstart Guide VM we can not debug it. Ensure you meet the requirements specified in the README: Unix, MacOS, or Windows with Python >= 3.7, Pytorch >= 1.0, etc.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

parul19 commented 5 years ago

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter. For example, our single class tutorial operates just as well with no modifications to the cfg file: https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class It's not clean and its not optimal, but it works.

Thank you ! it did works!

i want to retain the existing classes and add new class i.e total of 80+1=81 class in coco dataset.Please tell me how to do it using transfer learning

glenn-jocher commented 5 years ago

@parul19 you create a new 81 class cfg. Follow the directions in the example above.

sooonism commented 5 years ago

Do we still need COCO dataset if we only do transfer-learning?

glenn-jocher commented 5 years ago

@sooonism you need whatever dataset you want to train on.

Santhosh1509 commented 5 years ago

@glenn-jocher I am interested in extracting the vehicles on the road. So my interested Motorbike Bicycle Bus Car and truck.

I have a vehicle that is not truck but is being detected as truck. I have collected the new data for this vehicle in COCO format. I want to this add a new class to the existing pre trained network.

Planning to

load the final layer weights of the truck to this new class
alter the conf file according and start training

My question is how do i, do it?

glenn-jocher commented 5 years ago

@Santhosh1509 well I would start by reviewing the examples in the wiki, such as the custom training tutorial: https://github.com/ultralytics/yolov3/wiki

Santhosh1509 commented 5 years ago

@glenn-jocher Need your opinion on this. I just saw a post called transfer learning tutorial for SSD using keras.

Its mentioned in

Option 1: Just ignore the fact that we need only 8 classes

This would work, and it wouldn't even be a terrible option. Since only 8 out of the 80 classes would get trained, the model might get gradually worse at predicting the other 72 clases in the second paragraph.

So I feel, even if i could some how train as i mentioned above for a particular new class, the prediction for the other classes might get affected.

Is my approach, right? Is there an alternative way where I could preserve the prediction of the other classes introducing this new class in the same neural network? I feel it needs to be trained from scratch then. What do you think?

glenn-jocher commented 5 years ago

@Santhosh1509 training normally will produce the best results. Transfer learning produces mediocre results quickly.

Santhosh1509 commented 5 years ago

@glenn-jocher How do I get to know the training loss,training accuracy,validation loss and validation accuracy ?

All i get is this during training 63856968-71b37180-c9c0-11e9-82f5-b6e8683f6f43

Please guide how do I tune my hyper parameters with this data that is being displayed here?

I could have increase the batch size I have more memory on the GPU Untitled

I do not understand the comment on these line

parser.add_argument('--epochs', type=int, default=273) # 500200 batches at bs 16, 117263 images = 273 epochs

parser.add_argument('--batch-size', type=int, default=32) # effective bs = batch_size accumulate = 16 4 = 64

parser.add_argument('--accumulate', type=int, default=2, help='batches to accumulate before optimizing')

PS: latest training image

obj and cls values decreasing, is it good for this training?

63859125-6104fa80-c9c4-11e9-9b46-7399c031a9f2

glenn-jocher commented 5 years ago

@Santhosh1509 all of the information you mention is recorded in results.txt. You can plot this with from utils.utils import *; plot_results(). You should use batch_size 64 accumulate 1 if possible, if not compensate with smaller batch sizes and larger accumulation counts, i.e. batch_size 32 accumulate 2.

obj and cls are training losses, they are supposed to decrease during training. See https://github.com/ultralytics/yolov3/issues/392 for hyperparameter evolution, and explore the open issues for answers to your questions.

Santhosh1509 commented 5 years ago

@glenn-jocher This is what is stored in results.txt

obj cls total targets , I am confused as to how these relate to training loss,training accuracy,validation loss and validation accuracy

Don't we have a graph which is easy to visualize, rather than just numbers.

Something like this

Now we can use even tensor board support inside pytorch to visualize the values

As the name mentions HYPERPARAMETER EVOLUTION is to plot those not how these (training loss,training accuracy,validation loss and validation accuracy) changed per epoch

glenn-jocher commented 5 years ago

@Santhosh1509 Tensorboard logs automatically in this repo if you have it installed. See https://github.com/ultralytics/yolov3/pull/435

Santhosh1509 commented 5 years ago

@glenn-jocher Please explain how obj cls total targets being displayed here relate to training loss,training accuracy,validation loss and validation accuracy?

I can only relate terms P -> Precision R -> Recall mAP -> mean Average Precision F1 ->F1 score

glenn-jocher commented 5 years ago

@glenn-jocher accuracy is a classification metric, it is not used here. The metrics displayed during training are training losses and the number of targets per batch.

Santhosh1509 commented 5 years ago

@glenn-jocher obj or cls which one of these is training loss and what does the other terms mean because both of them decrease during training.

glenn-jocher commented 5 years ago

object loss and class loss. training loss is the total of all training losses.

pasin-k commented 5 years ago

Hi @glenn-jocher, so I followed the instruction above, tried to transfer learning with the original coco dataset. However, I found out that sometimes, some element of the loss from bbox_iou function is infinity. Apparently the variable 'pbox' has an extremely high value (3.438e+35) which cause it to infinity when calculating c_area.

From what I checked, variable 'ps' has value in range of [-1895, 80.24] and when I checked pbox = torch.cat((pxy, torch.exp(ps[:, 2:4]) * anchor_vec[i]), 1) 'pbox' has value range from [2.54e-21, 3.44e+35]

so I guess this is where the problem comes from but I don't know how to fix this problem. Any ideas? Thanks.

glenn-jocher commented 5 years ago

@jobpasin is transfer learning does not converge simply train normally (which will produce better results anyways).

Santhosh1509 commented 5 years ago

@jobpasin Hope this helps few points to note

New data to be learnt through transfer learning should be similar to the trained one (my case objects of interest for me were getting detected as trucks so I collected new data with same index as COCO index for trucks that is 8)
If you think point one is satisfied, play around with the learning rate, if the losses still are high. Collect more of the new data, try again.

I have to collect more data since my obj loss don't go below 0.86 even of 273 epochs

These videos below might be of some use though they are in general for improving the NN

pasin-k commented 5 years ago

@glenn-jocher Unfortunately, I am going to train with a much smaller dataset afterward so I need to use transfer learning. On the other hand, with smaller batch size, the model sometimes converges.

@Santhosh1509 Thanks for the tips. My case is feature detection like a circle, star, an alphabet in a photo, so I think it is kind of similar I think? Currently adjusting the learning rate as you said hoping I can get some good result.

glenn-jocher commented 5 years ago

@jobpasin you could try Adam as well with an lr0 of about 1.5E-4.

shahidammer commented 5 years ago

. You should use batch_size 64 accumulate 1 if possible, if not compensate with smaller batch sizes and larger accumulation counts, i.e. batch_size 32 accumulate 2.

I tried with batch_size 64 and accumulate 1 but i am getting an Warning WARNING: non-finite loss, ending training tensor([ nan, 1.89257e+00, 3.75394e+04, nan], device='cuda:0') and it crashed.

I have to 1080tis and I want to increase the batch size from 32 to 128 or more, but it crashes for all values except bs=16 and accumulate=2. Any suggestions?

santhoshnumberone commented 4 years ago

. You should use batch_size 64 accumulate 1 if possible, if not compensate with smaller batch sizes and larger accumulation counts, i.e. batch_size 32 accumulate 2.

I tried with batch_size 64 and accumulate 1 but i am getting an Warning WARNING: non-finite loss, ending training tensor([ nan, 1.89257e+00, 3.75394e+04, nan], device='cuda:0') and it crashed.

I have to 1080tis and I want to increase the batch size from 32 to 128 or more, but it crashes for all values except bs=16 and accumulate=2. Any suggestions?

learning rate is too high

Hope this image helps you out understand.

Screen-Shot-2018-02-24-at-11 47 09-AM Setting the learning rate of your neural network.

glenn-jocher commented 4 years ago

@Santhosh1509 yes that's a good example. High LR's may be an advantage at the beginning of training, but later on they will bounce around local minima without descending into them properly just as in the charts you show, though ironically they may also prevent overtraining as a positive side effect. In general though best practices is to start with an LR of 1E-3 SGD or 1E-4 Adam and reduce after 80% of epochs have been completed by a gain of around 0.1 to 0.01.

shahidammer commented 4 years ago

Thank you for the prompt response. I am using 'lr0': 0.00025 for --batch-size 192 --accumulate 2 --transfer --weights weights/yolov3.pt are there any other settings which i need to alter?

glenn-jocher commented 4 years ago

@shahidammer train with default settings, and then look at your results.png for guidance on tune your training settings.

shahidammer commented 4 years ago

@glenn-jocher default settings does not work as it gives me tried with batch_size 64 and accumulate 1 but i am getting an Warning WARNING: non-finite loss, ending training tensor([ nan, 1.89257e+00, 3.75394e+04, nan], device='cuda:0')

Thanks to @Santhosh1509 response, i decrease the Lr to 0.001 to 0.00025 but after 20 epoch, the map is still zero.

santhoshnumberone commented 4 years ago

@glenn-jocher default settings does not work as it gives me tried with batch_size 64 and accumulate 1 but i am getting an Warning WARNING: non-finite loss, ending training tensor([ nan, 1.89257e+00, 3.75394e+04, nan], device='cuda:0')

Thanks to @Santhosh1509 response, i decrease the Lr to 0.001 to 0.00025 but after 20 epoch, the map is still zero.

Use

This is one of the ways of learning rate decay after specific number of epochs, you can try it out.

torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)

>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05     if epoch < 30
>>> # lr = 0.005    if 30 <= epoch < 60
>>> # lr = 0.0005   if 60 <= epoch < 90
>>> # ...
>>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()

Source: torch.optim.lr_scheduler.StepLR

glenn-jocher commented 4 years ago

@Santhosh1509 ah this could be caused by the aggressive LR gain we have on transfer learning. Sorry, we haven't been making transfer learning a priority, yes this makes sense then that you ended up with such a tiny lr0.

aquiire commented 4 years ago

@Santhosh1509 training normally will produce the best results. Transfer learning produces mediocre results quickly.

Are you sure? @glenn-jocher

glenn-jocher commented 4 years ago

@aquiire this shows the coco_16img.data tutorial starting from a few different options, including transfer learning. Transfer learning as shown below typically freezes the main pretrained weights, which constrains its performance. You can replicate these results with this code and looking at the resultant results.png file.

python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights weights/ultralytics49.pt --name ultralytics49_start
python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights weights/darknet53.conv.74 --name darknet53.conv.74_start
python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights weights/yolov3-spp.weights --name yolov3-spp_start
python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights weights/yolov3-spp.weights --transfer --name yolov3-spp_transfer
python3 train.py --data data/coco_64img.data --batch-size 16 --accumulate 1 --nosave --weights '' --name from_scratch

results

aquiire commented 4 years ago

@glenn-jocher Thanks for the explanation. By training normally, did you mean training from scratch? If yes, then we have to compare orange with red neh?

glenn-jocher commented 4 years ago

@aquiire orange is with randomly intialized weights. Blue starts from darknet53.conv.74 backbone, red and green both start from yolov3-spp.weights (red freezes all layers except outputs, which is typically called "transfer learning").

SurionAndrew commented 4 years ago

I am doing person detection from cctv footage however there are instances where people are not detected, probably lighting, camera angle or warping (and more reasons).

My question is thus: How would I increase my networks accuracy? is it okay to get these false detections and just train on the new images, via transfer learning

Or must i download the coco data set for people and merge my new Images and retrain completely

How many images are recommended for transfer learning

Regards Andrew

glenn-jocher commented 4 years ago

@SurionAndrew of course there are instances of FPs and missed detections. If not your mAP would need to be 100%. Transfer learning is a waste of time. Train from scratch with no backbone untill validation losses beging to increase, then set your --epochs to that epoch and retrain to lock in LR drops. See https://github.com/ultralytics/yolov3/issues/310

coolmarat commented 4 years ago

I try to start transfer learning with downloaded from google drive yolov3.pt, but immideately get this error

File "D:/AI/yolov3/train.py", line 113, in <dictcomp> chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()} KeyError: module_list.78.Conv2d.weight

If I try yolov3.weights file then I get another error

File "D:\ai\yolov3\models.py", line 342, in load_darknet_weights conv_w = torch.from_numpy(weights[ptr:ptr + num_w]).view_as(conv_layer.weight) RuntimeError: shape '[256, 128, 3, 3]' is invalid for input of size 282007

When train from scratch with my cfg and data files no errors occured.

Can anybody help me to resolve it?

FranciscoReveriano commented 4 years ago

What is the full command you used? to initialize the training?

glenn-jocher commented 4 years ago

@coolmarat your repo is out of date, git pull and try again.

SHikumo commented 4 years ago

Here is my result after 300 epoch, and this is video of the result: https://www.youtube.com/watch?v=8r4BNEMv_2Y results_transfer_2412

You can see it have detected a car door ? How can i solve this problem ?

glenn-jocher commented 4 years ago

@SHikumo I don't understand your question. Your GIoU loss looks strange, you should ensure your boxes are labelled correctly.

SHikumo commented 4 years ago

@SHikumo I don't understand your question. Your GIoU loss looks strange, you should ensure your boxes are labelled correctly.

Thanks for reviewing my transfer learning result. I'm sure that we have label "one_class" object right, i have cleared bad dataset too. I have done with about 1718 images of person ( different size, different angles), but the result still acceptable. If you don't mind please watch my result video: https://youtu.be/8r4BNEMv_2Y

ultralytics / yolov3

TRANSFER LEARNING EXAMPLE #106

Before You Start

Transfer Learning

Reproduce Our Environment