ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.78k stars 16.36k forks source link

Classificaton only #2001

Closed Zohiet closed 3 years ago

Zohiet commented 3 years ago

❔Question

Thanks for this great work! I am planning to perform a classification-only task on yolov5. It is a wise way to do it by setting the bbox to the whole image? Like in the label.txt (class 0.5 0.5 1 1). Or is there any other better workaround?

Additional context

github-actions[bot] commented 3 years ago

👋 Hello @Zohiet, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 3 years ago

@Zohiet good question. There's a classification branch under development here: https://github.com/ultralytics/yolov5/tree/classifier

You can use classifier.py (https://github.com/ultralytics/yolov5/blob/classifier/classifier.py). It allows you to train efficientnet models from Ross Wightman (@rwightman) as well as classification versions of YOLOv5. It uses a directory dataloader typical in classification, so you can train typical classification datasets like CIFAR, Imagenet etc directly with it.

We've provided some prepackaged classification datasets correctly formatted here: https://github.com/ultralytics/yolov5/releases/download/v1.0/cifar10.zip https://github.com/ultralytics/yolov5/releases/download/v1.0/cifar100.zip https://github.com/ultralytics/yolov5/releases/download/v1.0/mnist.zip

We want to do additional work down this path, including adapting the mosaic dataloader for classification when we find some time.

glenn-jocher commented 3 years ago

@AyushExel BTW see above message for a brief update on classification efforts. I've created a new branch with a single new file that incorporates everything needed for classification training of efficientnet and YOLOv5 classifier models. It needs additional work, but it's started there.

glenn-jocher commented 3 years ago

@Zohiet the code for transforming a YOLOv5 model into a classifier is here BTW. This cuts off most of the head and replaces it with a single Classify() module. All of this is immature and needs additional development and experimentation but it works in principle.

        # YOLOv5 Classifier
        model = torch.hub.load('ultralytics/yolov5', opt.model, pretrained=True)
        model.model = model.model[:8]
        m = model.model[-1]  # last layer
        ch = m.conv.in_channels if hasattr(m, 'conv') else sum([x.in_channels for x in m.m])  # ch into module
        c = Classify(ch, nc)  # Classify()
        c.i, c.f, c.type = m.i, m.f, 'models.common.Classify'  # index, from, type
        model.model[-1] = c  # replace
AyushExel commented 3 years ago

@AyushExel BTW see above message for a brief update on classification efforts. I've created a new branch with a single new file that incorporates everything needed for classification training of efficientnet and YOLOv5 classifier models. It needs additional work, but it's started there.

@glenn-jocher that is great. I'll try it out as soon as I get back from vacation. This is a great step for making this library a one-stop solution for CV problems :)

Zohiet commented 3 years ago

@glenn-jocher Very glad to know that you guys are making effort to create a new classification branch, I appreciate it. I'll try it out as you said transform YOLOv5 into a classifier. But I am still curious what if I keep the model and change the label (like I said, setting the bbox to the whole image : class 0.5 0.5 1 1)? Will the model perform well?

glenn-jocher commented 3 years ago

@Zohiet sure you can do that as well, though it may be a questionable design decision to include uninformative components in the loss (i.e. boxes that always encompass the entire image).

5starkarma commented 3 years ago

@Zohiet sure you can do that as well, though it may be a questionable design decision to include uninformative components in the loss (i.e. boxes that always encompass the entire image).

Questionable but will work :P

One of the first object detection datasets I trained on was images of road signs and nothing else (maybe 5 px padding around the edge of each sign). The bbox was placed near the outer edge. Thinking back this dataset was obviously for an augmentation pipeline but that is besides the point. I trained on the dataset and then ran inference on a video from a dash cam in a city. Needless to say, every detection had the bbox around the edge of the image XD

Zohiet commented 3 years ago

@5starkarma Thanks for your info! In my inference dataset, I don't really care about the bbox as long as the classification is correct.

rsomani95 commented 3 years ago

@glenn-jocher I'm curious if you've considered having a yolo-v5 model that can do classification and bounding box detection at the same time.

I can see this happening two ways:

  1. The classification head could be a concat & avg pool (from the bbox head / FPN) => final linear layer. This may be slower, but would allow one to use pretrained yolo models, and add a classification head on top that could be trained on a different dataset

  2. As (I think) is implemented currently in the classification branch, the linear layer could be a branch from the backbone, whereas the bounding box head would be another branch.

Either approach could facilitate training with a custom dataset where you've got classification labels as well as bounding box annotations for the same image.

Curious to hear your thoughts :)

glenn-jocher commented 3 years ago

@rsomani95 yes, we have a branch that trains classification models (with classifier.py) here that we are experimenting with: https://github.com/ultralytics/yolov5/tree/classifier

Generally classification and detection tasks are not intermingled in the same network, or at least I've never seen them mixed together. The architectures are different (head differences), but also the datasets are different, and there's generally no way to convert say a COCO image into a single class nor annotate an Imagenet image with a single bounding box, particularly for some of the more rare classes.

That said, detection models already do classification, they classify every point in the output grid and every anchor within that point. There's an overlap there in theory but in practice they are always isolated tasks as far as I know.

rsomani95 commented 3 years ago

You're right, they're not coupled together in any public dataset that I know of. But I can easily imagine a scenario where both would be visible. One may want to do scene recognition and also recognise objects within the scene. Take the following image for example:

assault_on_precinct_13_filmgrab_12

The scene is a parking-lot, and there are objects of interest, like cars in it. Here, one needs to look at the image in entirety to deduce that it's a parking-lot.


My original question stemmed from having looked at the classification branch. I was curious if that + detection could be done together. Maybe this is an eclectic scenario given the state of public datasets, but I think having a model that could do it may encourage folks to formulate such problems more meaningfully?

glenn-jocher commented 3 years ago

@rsomani95 yes, many organizations exploit a single backbone for multiple tasks, i.e. Karpathy at Tesla calls this a 'hydra' network after the multi-headed monster from Greek mythology.

rsomani95 commented 3 years ago

@glenn-jocher That's where I first learnt about it too. The Greek mythology tidbit is a nice touch.

What I meant to ask with my first question was if you guys considered extending yolov5 to be adapted into a 'hydra' like network? I'm currently trying to implement this with icevision and was curious if you guys had this in your roadmap.

glenn-jocher commented 3 years ago

@rsomani95 well, it's somewhat complicated to introduce additional tasks/heads onto a backbone as the losses begin competing with each other, and you have to balance them accordingly, which takes experimentation etc etc.

So yes its an interesting idea for sure, and I suppose one might even be able to combine both a detection head and a seperate classification head, and even train them on semi-nonintersecting datasets like COCO+Imagenet, but in our current capacity we are a bit challenged already simply maintaining and updating the basic YOLOv5 repository, and ensuring export compatibility with the various pipelines etc. Our main priority is the largest use case/addressable market, which in vision AI appears to be object detection first and classification second, with segmentation in there somewhere as well, with more exotic ideas like this more in the research and publication domain.

zhiqwang commented 3 years ago

Hi @rsomani95 and @glenn-jocher

Recently I've done some related experiments. My focus is on the structure of YOLOv5, in detail:

  1. The classification head could be a concat & avg pool (from the bbox head / FPN) => final linear layer. This may be slower, but would allow one to use pretrained yolo models, and add a classification head on top that could be trained on a different dataset

Before I saw torchvision teams abstract this parts as a BackboneWithFPN Module here, YOLOv5's approach here is more like a PAN module than FPN, so I've refactored the implementation as a BackboneWithPAN module here. In fact, this part could be extracted from the yaml configuration file.

  1. As (I think) is implemented currently in the classification branch, the linear layer could be a branch from the backbone, whereas the bounding box head would be another branch.

I think the branch as @glenn-jocher mentioned above could partly answer this question, (thoughts?) also this can be modified to a DarkNet/MobileNet like Module here, they are just two ways of writing the same module.

rsomani95 commented 3 years ago

@glenn-jocher That makes a ton of sense. Thank you for the detailed response.

@zhiqwang Thank you for sharing these resources. I'll take a look, they look quite interesting.

I think the branch as @glenn-jocher mentioned above could partly answer this question, (thoughts?)

It partly does, yes. It shows how to make a classification head branch out from the backbone

glenn-jocher commented 3 years ago

Yes the https://github.com/ultralytics/yolov5/tree/classifier branch adds a classify.py file that does standalone classifier training. It does not attempt to merge tasks though.

There is a C5_divergent branch (https://github.com/ultralytics/yolov5/tree/C5_divergent) that examines some updated architectures that are similar to hydra nets. For example this model has one backbone and two heads: https://github.com/ultralytics/yolov5/blob/C5_divergent/models/hub/yolov5l6d-640.yaml

Basically the yaml files are very flexible and can allow you to define a lot of interesting shapes such as the dual-head network above, which I was using to see if there was any benefit to having different heads compute different losses (i.e. a box regression head and a obj/cls head seperately). I didn't find any benefit from doing this strangely enough, so it's possible that in the case of the detection the loss components might complement each other.

BartvanMarrewijk commented 3 years ago

I am not so familiar with changing the networks as described above. But I have a question about this classifier. My approach is much simpler (and probably not accurate) as the networks mentioned above, but what I want to do is to train the classifier on a public available plant dataset (plant village). Then retrain the network on my custom dataset (with bounding boxes) using the weights of the classifier. The problem is that classifier.py is somehow not working well for me. I tried it first on the mnist and cifar dataset but the accuracy is solely +-0.44. Has anybody experience with this classifier and if yes did someone obtain higher accuracies?

glenn-jocher commented 3 years ago

@studentWUR oh interesting. MNIST should be pretty high, i.e. around 0.98-0.99 maybe. I'll check it out.

glenn-jocher commented 3 years ago

@studentWUR I made a small bug fix https://github.com/ultralytics/yolov5/commit/04fddf507fcecace0ef1842f841517a4c0fdbdb3 to the classifier branch and now everything works correctly. MNIST is at 99% in 5 epochs.

INPUT:

python classifier.py --data mnist

OUTPUT:

YOLOv5 v4.0-42-gb34e21b torch 1.7.0+cu101 CUDA:0 (Tesla V100-SXM2-16GB, 16130.5MB)

Training yolov5s on mnist dataset with 10 classes...
Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 283 layers, 7276605 parameters, 7276605 gradients, 17.1 GFLOPS

Model Summary: 128 layers, 1194954 parameters, 1194954 gradients, 8.5 GFLOPS

epoch     gpu_mem   train_loss  val_loss    accuracy    
1/20      0.596G    0.443       0.0617      0.983       : 100% 469/469 [00:45<00:00, 10.32it/s]
2/20      0.598G    0.0927      0.0528      0.981       : 100% 469/469 [00:45<00:00, 10.22it/s]
3/20      0.598G    0.0742      0.0448      0.986       : 100% 469/469 [00:45<00:00, 10.27it/s]
4/20      0.598G    0.0635      0.0375      0.989       : 100% 469/469 [00:45<00:00, 10.32it/s]
5/20      0.598G    0.0546      0.031       0.991       : 100% 469/469 [00:46<00:00, 10.19it/s]
6/20      0.598G    0.0531      0.0271      0.991       : 100% 469/469 [00:45<00:00, 10.24it/s]
7/20      0.598G    0.0482      0.0253      0.992       : 100% 469/469 [00:45<00:00, 10.31it/s]
github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

abhigoku10 commented 3 years ago

@rsomani95 if i understand you would want to scene classification along with object detection right ? i think by adding a multitask network for backbone we will get it . @glenn-jocher @zhiqwang extending this thought, if i have two models both trained for different classes like Model1 for car , person and Model 2 for logo, texts having the same backbone like resnet or darknet is it possible to bring them into single network so that inference time and memory consumption can we reduced , can you please share your thoughts on this which will be helpful

glenn-jocher commented 3 years ago

@abhigoku10 anything is possible if you have enough time and resources to customize solutions, i.e. you are in academia and just want to research or in an enterprise with unlimited funds. In that case you can try adding additional heads with their own tasks and losses (and data labels), i.e. classify, detect, segment, keypoint heads separately.

If you are trying to bring a product to market with the minimum risk and lead time (and cost) then you should stick to the standard use-cases, i.e. YOLO for detection, EfficientNet for classification etc. and develop your product around those rather than the other way around.

abhigoku10 commented 3 years ago

@glenn-jocher thansk for the response , but logically can sharing of backbone be done since both are detection modules only constraint is different dataset and classes ? so there would not be need of separate training right

glenn-jocher commented 3 years ago

@abhigoku10 yes it should be possible. Karpathy calls these hydra networks, from Greek mythology.

glenn-jocher commented 3 years ago

@abhigoku10 I looked at the classifier branch and saw it had a few issues that had arisen due to divergence with master. I've merged master, verified correct operation, and added an inference usage example:

YOLOv5 Classifier Training

git clone https://github.com/ultralytics/yolov5 -b classifier
cd yolov5
pip install -r requirements.txt

python classifier.py --model yolov5s --data mnist --epochs 5 --img 128
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v5.0-527-g76259b1 torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

Training yolov5s on mnist dataset with 10 classes...
Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 v5.0-527-g76259b1 torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

Fusing layers... 
Model Summary: 224 layers, 7266973 parameters, 0 gradients
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
Model Summary: 101 layers, 1192362 parameters, 1192362 gradients
Image sizes 128 train, 128 test
Using 2 dataloader workers
Logging results to runs/train/exp3
Starting training for 5 epochs...

epoch     gpu_mem   train_loss  val_loss    accuracy    
1/5       0.958G    0.436       0.152       0.953       : 100% 469/469 [00:35<00:00, 13.38it/s]
2/5       0.958G    0.128       0.0674      0.979       : 100% 469/469 [00:34<00:00, 13.43it/s]
3/5       0.958G    0.0908      0.0609      0.98        : 100% 469/469 [00:34<00:00, 13.52it/s]
4/5       0.958G    0.0689      0.0379      0.986       : 100% 469/469 [00:34<00:00, 13.52it/s]
5/5       0.958G    0.0499      0.0279      0.99        : 100% 469/469 [00:34<00:00, 13.44it/s]
Training complete. Results saved to runs/train/exp.

YOLOv5 Classifier Inference

import cv2
import numpy as np
import torch
import torch.nn.functional as F

# Functions
resize = torch.nn.Upsample(size=(128, 128), mode='bilinear', align_corners=False)
normalize = lambda x, mean=0.5, std=0.25: (x - mean) / std

# Model
model = torch.load('runs/train/exp/weights/best.pt')['model'].cpu().float()

# Image
im = cv2.imread('../mnist/test/0/10.png')[::-1]  # HWC, BGR to RGB
im = np.ascontiguousarray(np.asarray(im).transpose((2, 0, 1)))  # HWC to CHW
im = torch.tensor(im).unsqueeze(0) / 255.0  # to Tensor, to BCWH, rescale
im = resize(normalize(im))

# Inference
results = model(im)
p = F.softmax(results, dim=1)  # probabilities
print(p)

tensor([[9.99685e-01, 2.24096e-11, 6.25672e-06, 6.37622e-07, 2.57255e-09, 6.91358e-07, 3.36181e-05, 3.68758e-08, 6.94699e-06, 2.66681e-04]], grad_fn=<SoftmaxBackward>)
ajonand commented 3 years ago

@glenn-jocher Is it possible to do multi-label classification only? If i understand the data structure in classifier.py correctly, the images should be placed in folder corresponding to the class they belong to? So how would a multi-label case work in that case?

glenn-jocher commented 3 years ago

@ajonand on the labelling side that's a good question, I think the dataset structure impedes multilabel as you mentioned.

On the inference side classification is always multilabel, as every label receives an output, and a softmax normalizes all of these to sum to 1, so yes you can view say the top 3 or top 10 classification labels for an image rather than just the highest likelihood class.

matpy1 commented 3 years ago

@glenn-jocher Hi Glenn, I tried to use the classifier inference that you provided, the result is a big tensor (the results variable in your code) which I don't understand. Also the results.print() method doesn't work unlike in the YoloV5 inference tutorial. Could you please explain what the results are? My goal was to get the class of a cropped image. Thanks a lot!

glenn-jocher commented 3 years ago

@matpy1 YOLOv5 classification models output in the same format as every other classification model, i.e. EfficientNet, ResNet, etc. These are confidence vectors of shape(batch,class), i.e. (16,100) for 16 images and 100 classes.

vijishmadhavan commented 3 years ago

@matpy1 YOLOv5 classification models output in the same format as every other classification model, i.e. EfficientNet, ResNet, etc. These are confidence vectors of shape(batch,class), i.e. (16,100) for 16 images and 100 classes.

Will classify only trained model work on video using detect.py?

glenn-jocher commented 3 years ago

@vijishmadhavan no, detect.py only works for normal YOLOv5 detection models. You can see classification inference example in classifier.py Usage section: https://github.com/ultralytics/yolov5/blob/136640eee86b529b6419e6e9b4c7c008aea9b6a8/classifier.py#L8-L29

mzhadigerov commented 3 years ago

prediction is always the same class (wrong class, by the way) even when I test with the image from training set. Even though my best model got 100% accuracy.

xellDart commented 2 years ago

Hi @glenn-jocher , thanks for your amazing work, I have one question, I train my dataset using classifier.py, with accuracy of 0.981, later i use this code using ncnn for classification task:

std::vector < float > softmax(const float * logits, unsigned int _size, unsigned int & max_id) {

  if (_size == 0 || logits == nullptr) return {};
  float max_prob = 0. f, total_exp = 0. f;
  std::vector < float > softmax_probs(_size);
  for (unsigned int i = 0; i < _size; ++i) {
    softmax_probs[i] = std::exp((float) logits[i]);
    total_exp += softmax_probs[i];
  }
  for (unsigned int i = 0; i < _size; ++i) {
    softmax_probs[i] = softmax_probs[i] / total_exp;
    if (softmax_probs[i] > max_prob) {
      max_id = i;
      max_prob = softmax_probs[i];
    }
  }
  return softmax_probs;
}

std::vector < unsigned int > argsort(const std::vector < float > & arr) {
  if (arr.empty()) return {};
  const unsigned int _size = arr.size();
  std::vector < unsigned int > indices;
  for (unsigned int i = 0; i < _size; ++i) indices.push_back(i);
  std::sort(indices.begin(), indices.end(),
    [ & arr](const unsigned int a,
      const unsigned int b) {
      return arr[a] > arr[b];
    });
  return indices;
}

static float detect_yolo_classifier(const cv::Mat & image) {
    ncnn::Net yolo_classifier;

    yolo_classifier.opt.use_vulkan_compute = false;

    yolo_classifier.load_param("model.param");
    yolo_classifier.load_model("model.bin");

    const int target_size = 352;

    ncnn::Mat in = ncnn::Mat::from_pixels_resize(image.data, ncnn::Mat::PIXEL_BGR2RGB, image.cols, image.rows, target_size, target_size);

    const float mean_vals[3] = {
      0. f,
      0. f,
      0. f
    };
    const float norm_vals[3] = {
      1.0 / 255. f,
      1.0 / 255. f,
      1.0 / 255. f
    }; in .substract_mean_normalize(mean_vals, norm_vals);

    ncnn::Extractor ex = yolo_classifier.create_extractor();

    ex.input("in0", in );
    ncnn::Mat out;
    ex.extract("out0", out);

    const unsigned int num_classes = out.w;

    const float * logits = (float * ) out.data;
    unsigned int max_id;
    std::vector < float > scores = softmax(logits, num_classes, max_id);
    std::vector < unsigned int > sorted_indices = argsort(scores);

    return sorted_indices[0];

Using python original method in yolov5 repo all is ok, but using ncnn with c++ only detect one class correctly, can you check my code please? I need to use padding image like yolo detector or not?

Like this?

// yolov5/models/common.py DetectMultiBackend
    const int max_stride = 64;

    // letterbox pad to multiple of max_stride
    int w = img_w;
    int h = img_h;
    float scale = 1.f;
    if (w > h)
    {
        scale = (float)target_size / w;
        w = target_size;
        h = h * scale;
    }
    else
    {
        scale = (float)target_size / h;
        h = target_size;
        w = w * scale;
    }

    ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);

    // pad to target_size rectangle
    // yolov5/utils/datasets.py letterbox
    int wpad = (w + max_stride - 1) / max_stride * max_stride - w;
    int hpad = (h + max_stride - 1) / max_stride * max_stride - h;
    ncnn::Mat in_pad;
    ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 114.f);

Im using script to export to torchscript

python3 export.py --weights '/home/miguel/Documentos/yolov5/runs/train/exp/weights/best.pt' --include torchscript --img 320

Thanks

glenn-jocher commented 2 years ago

@xellDart I'm not great at C and since we're so busy I can't individually comment on users code much, but in general you need to make sure the image is preprocessed exactly the same way (i.e. same image size, same transforms, same RGB order) in your custom script as in the official inference script here: https://github.com/ultralytics/yolov5/blob/9794f63ddfdc7599a1ed368395115f9cd3c7d50f/classifier.py#L8-L14

Also I haven't verified export of classification models using export.py, but it's possible it may work out of the box already.

mic2112 commented 2 years ago

Yes the https://github.com/ultralytics/yolov5/tree/classifier branch adds a classify.py file that does standalone classifier training. It does not attempt to merge tasks though.

There is a C5_divergent branch (https://github.com/ultralytics/yolov5/tree/C5_divergent) that examines some updated architectures that are similar to hydra nets. For example this model has one backbone and two heads: https://github.com/ultralytics/yolov5/blob/C5_divergent/models/hub/yolov5l6d-640.yaml

Basically the yaml files are very flexible and can allow you to define a lot of interesting shapes such as the dual-head network above, which I was using to see if there was any benefit to having different heads compute different losses (i.e. a box regression head and a obj/cls head seperately). I didn't find any benefit from doing this strangely enough, so it's possible that in the case of the detection the loss components might complement each other.

Hey @glenn-jocher, the Hydra branch doesn't exist anymore. Can you please share an example yaml for a hydra net implementation of yolov5?

Thanks

mic2112 commented 2 years ago

Hey @abhigoku10,

Were you successful in making a hydra net implementation of Yolov5, I am working on something similar and need help with that.

Thanks

abhigoku10 commented 2 years ago

@mic2112 nope i could not pursue hydranet implementation due to other priority activities , once done pls share it would helpful

LeonNerd commented 2 years ago

@glenn-jocher 感谢您的响应,但是逻辑上可以共享骨干网,因为两者都是检测模块,唯一的约束是不同的数据集和类?所以不需要单独的培训权

hello, I want to do the same thing,I want needs to be trained in a Multi-Task setup where both heads share the same backbone. what should i do? thanks

guptasaumya commented 1 year ago

Hi @glenn-jocher

Have TTA and test evaluation logging been implemented for classification? Executing classify/val.py give empty exp folders to me.

glenn-jocher commented 1 year ago

@guptasaumya no, TTA and test eval are not implemented for classification models. Currently no assets are created that would be saved to classify/val.py, but if you'd like to help by submitting a PR that would be great!

jain-abhay commented 5 months ago

@glenn-jocher 感谢您的响应,但是逻辑上可以共享骨干网,因为两者都是检测模块,唯一的约束是不同的数据集和类?所以不需要单独的培训权

hello, I want to do the same thing,I want needs to be trained in a Multi-Task setup where both heads share the same backbone. what should i do? thanks

Hi @LeonNerd , were you able to pursue the task that you had mentioned (same backbone and 2 different heads)? Please can you kindly guide incase you were able to understand it. Thanks

jain-abhay commented 5 months ago

Yes the https://github.com/ultralytics/yolov5/tree/classifier branch adds a classify.py file that does standalone classifier training. It does not attempt to merge tasks though.

There is a C5_divergent branch (https://github.com/ultralytics/yolov5/tree/C5_divergent) that examines some updated architectures that are similar to hydra nets. For example this model has one backbone and two heads: https://github.com/ultralytics/yolov5/blob/C5_divergent/models/hub/yolov5l6d-640.yaml

Basically the yaml files are very flexible and can allow you to define a lot of interesting shapes such as the dual-head network above, which I was using to see if there was any benefit to having different heads compute different losses (i.e. a box regression head and a obj/cls head seperately). I didn't find any benefit from doing this strangely enough, so it's possible that in the case of the detection the loss components might complement each other.

Hi @glenn-jocher , please is it possible if you can kindly share the updated links for the repositories for C5_divergent branch and the model that has one backbone but 2 heads since the above links are no longer opening? It would be really helpful. Thanks in anticipation.

jain-abhay commented 5 months ago

Hey @abhigoku10,

Were you successful in making a hydra net implementation of Yolov5, I am working on something similar and need help with that.

Thanks

Hi @mic2112 , were you able to pursue the task that you had mentioned (same backbone and 2 different heads)? Please can you kindly guide incase you were able to understand it. Thanks