Closed Zohiet closed 3 years ago
👋 Hello @Zohiet, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7
. To install run:
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@Zohiet good question. There's a classification branch under development here: https://github.com/ultralytics/yolov5/tree/classifier
You can use classifier.py (https://github.com/ultralytics/yolov5/blob/classifier/classifier.py). It allows you to train efficientnet models from Ross Wightman (@rwightman) as well as classification versions of YOLOv5. It uses a directory dataloader typical in classification, so you can train typical classification datasets like CIFAR, Imagenet etc directly with it.
We've provided some prepackaged classification datasets correctly formatted here: https://github.com/ultralytics/yolov5/releases/download/v1.0/cifar10.zip https://github.com/ultralytics/yolov5/releases/download/v1.0/cifar100.zip https://github.com/ultralytics/yolov5/releases/download/v1.0/mnist.zip
We want to do additional work down this path, including adapting the mosaic dataloader for classification when we find some time.
@AyushExel BTW see above message for a brief update on classification efforts. I've created a new branch with a single new file that incorporates everything needed for classification training of efficientnet and YOLOv5 classifier models. It needs additional work, but it's started there.
@Zohiet the code for transforming a YOLOv5 model into a classifier is here BTW. This cuts off most of the head and replaces it with a single Classify() module. All of this is immature and needs additional development and experimentation but it works in principle.
# YOLOv5 Classifier
model = torch.hub.load('ultralytics/yolov5', opt.model, pretrained=True)
model.model = model.model[:8]
m = model.model[-1] # last layer
ch = m.conv.in_channels if hasattr(m, 'conv') else sum([x.in_channels for x in m.m]) # ch into module
c = Classify(ch, nc) # Classify()
c.i, c.f, c.type = m.i, m.f, 'models.common.Classify' # index, from, type
model.model[-1] = c # replace
@AyushExel BTW see above message for a brief update on classification efforts. I've created a new branch with a single new file that incorporates everything needed for classification training of efficientnet and YOLOv5 classifier models. It needs additional work, but it's started there.
@glenn-jocher that is great. I'll try it out as soon as I get back from vacation. This is a great step for making this library a one-stop solution for CV problems :)
@glenn-jocher Very glad to know that you guys are making effort to create a new classification branch, I appreciate it. I'll try it out as you said transform YOLOv5 into a classifier. But I am still curious what if I keep the model and change the label (like I said, setting the bbox to the whole image : class 0.5 0.5 1 1)? Will the model perform well?
@Zohiet sure you can do that as well, though it may be a questionable design decision to include uninformative components in the loss (i.e. boxes that always encompass the entire image).
@Zohiet sure you can do that as well, though it may be a questionable design decision to include uninformative components in the loss (i.e. boxes that always encompass the entire image).
Questionable but will work :P
One of the first object detection datasets I trained on was images of road signs and nothing else (maybe 5 px padding around the edge of each sign). The bbox was placed near the outer edge. Thinking back this dataset was obviously for an augmentation pipeline but that is besides the point. I trained on the dataset and then ran inference on a video from a dash cam in a city. Needless to say, every detection had the bbox around the edge of the image XD
@5starkarma Thanks for your info! In my inference dataset, I don't really care about the bbox as long as the classification is correct.
@glenn-jocher I'm curious if you've considered having a yolo-v5 model that can do classification and bounding box detection at the same time.
I can see this happening two ways:
The classification head could be a concat & avg pool (from the bbox head / FPN) => final linear layer. This may be slower, but would allow one to use pretrained yolo models, and add a classification head on top that could be trained on a different dataset
As (I think) is implemented currently in the classification branch, the linear layer could be a branch from the backbone, whereas the bounding box head would be another branch.
Either approach could facilitate training with a custom dataset where you've got classification labels as well as bounding box annotations for the same image.
Curious to hear your thoughts :)
@rsomani95 yes, we have a branch that trains classification models (with classifier.py) here that we are experimenting with: https://github.com/ultralytics/yolov5/tree/classifier
Generally classification and detection tasks are not intermingled in the same network, or at least I've never seen them mixed together. The architectures are different (head differences), but also the datasets are different, and there's generally no way to convert say a COCO image into a single class nor annotate an Imagenet image with a single bounding box, particularly for some of the more rare classes.
That said, detection models already do classification, they classify every point in the output grid and every anchor within that point. There's an overlap there in theory but in practice they are always isolated tasks as far as I know.
You're right, they're not coupled together in any public dataset that I know of. But I can easily imagine a scenario where both would be visible. One may want to do scene recognition and also recognise objects within the scene. Take the following image for example:
The scene is a parking-lot
, and there are objects of interest, like car
s in it. Here, one needs to look at the image in entirety to deduce that it's a parking-lot
.
My original question stemmed from having looked at the classification branch. I was curious if that + detection could be done together. Maybe this is an eclectic scenario given the state of public datasets, but I think having a model that could do it may encourage folks to formulate such problems more meaningfully?
@rsomani95 yes, many organizations exploit a single backbone for multiple tasks, i.e. Karpathy at Tesla calls this a 'hydra' network after the multi-headed monster from Greek mythology.
@glenn-jocher That's where I first learnt about it too. The Greek mythology tidbit is a nice touch.
What I meant to ask with my first question was if you guys considered extending yolov5 to be adapted into a 'hydra' like network? I'm currently trying to implement this with icevision
and was curious if you guys had this in your roadmap.
@rsomani95 well, it's somewhat complicated to introduce additional tasks/heads onto a backbone as the losses begin competing with each other, and you have to balance them accordingly, which takes experimentation etc etc.
So yes its an interesting idea for sure, and I suppose one might even be able to combine both a detection head and a seperate classification head, and even train them on semi-nonintersecting datasets like COCO+Imagenet, but in our current capacity we are a bit challenged already simply maintaining and updating the basic YOLOv5 repository, and ensuring export compatibility with the various pipelines etc. Our main priority is the largest use case/addressable market, which in vision AI appears to be object detection first and classification second, with segmentation in there somewhere as well, with more exotic ideas like this more in the research and publication domain.
Hi @rsomani95 and @glenn-jocher
Recently I've done some related experiments. My focus is on the structure of YOLOv5, in detail:
- The classification head could be a concat & avg pool (from the bbox head / FPN) => final linear layer. This may be slower, but would allow one to use pretrained yolo models, and add a classification head on top that could be trained on a different dataset
Before I saw torchvision teams abstract this parts as a BackboneWithFPN
Module here, YOLOv5's approach here is more like a PAN module than FPN, so I've refactored the implementation as a BackboneWithPAN
module here. In fact, this part could be extracted from the yaml configuration file.
- As (I think) is implemented currently in the classification branch, the linear layer could be a branch from the backbone, whereas the bounding box head would be another branch.
I think the branch as @glenn-jocher mentioned above could partly answer this question, (thoughts?) also this can be modified to a DarkNet/MobileNet like Module here, they are just two ways of writing the same module.
@glenn-jocher That makes a ton of sense. Thank you for the detailed response.
@zhiqwang Thank you for sharing these resources. I'll take a look, they look quite interesting.
I think the branch as @glenn-jocher mentioned above could partly answer this question, (thoughts?)
It partly does, yes. It shows how to make a classification head branch out from the backbone
Yes the https://github.com/ultralytics/yolov5/tree/classifier branch adds a classify.py file that does standalone classifier training. It does not attempt to merge tasks though.
There is a C5_divergent branch (https://github.com/ultralytics/yolov5/tree/C5_divergent) that examines some updated architectures that are similar to hydra nets. For example this model has one backbone and two heads: https://github.com/ultralytics/yolov5/blob/C5_divergent/models/hub/yolov5l6d-640.yaml
Basically the yaml files are very flexible and can allow you to define a lot of interesting shapes such as the dual-head network above, which I was using to see if there was any benefit to having different heads compute different losses (i.e. a box regression head and a obj/cls head seperately). I didn't find any benefit from doing this strangely enough, so it's possible that in the case of the detection the loss components might complement each other.
I am not so familiar with changing the networks as described above. But I have a question about this classifier. My approach is much simpler (and probably not accurate) as the networks mentioned above, but what I want to do is to train the classifier on a public available plant dataset (plant village). Then retrain the network on my custom dataset (with bounding boxes) using the weights of the classifier. The problem is that classifier.py is somehow not working well for me. I tried it first on the mnist and cifar dataset but the accuracy is solely +-0.44. Has anybody experience with this classifier and if yes did someone obtain higher accuracies?
@studentWUR oh interesting. MNIST should be pretty high, i.e. around 0.98-0.99 maybe. I'll check it out.
@studentWUR I made a small bug fix https://github.com/ultralytics/yolov5/commit/04fddf507fcecace0ef1842f841517a4c0fdbdb3 to the classifier branch and now everything works correctly. MNIST is at 99% in 5 epochs.
INPUT:
python classifier.py --data mnist
OUTPUT:
YOLOv5 v4.0-42-gb34e21b torch 1.7.0+cu101 CUDA:0 (Tesla V100-SXM2-16GB, 16130.5MB)
Training yolov5s on mnist dataset with 10 classes...
Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1182720 models.common.C3 [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 283 layers, 7276605 parameters, 7276605 gradients, 17.1 GFLOPS
Model Summary: 128 layers, 1194954 parameters, 1194954 gradients, 8.5 GFLOPS
epoch gpu_mem train_loss val_loss accuracy
1/20 0.596G 0.443 0.0617 0.983 : 100% 469/469 [00:45<00:00, 10.32it/s]
2/20 0.598G 0.0927 0.0528 0.981 : 100% 469/469 [00:45<00:00, 10.22it/s]
3/20 0.598G 0.0742 0.0448 0.986 : 100% 469/469 [00:45<00:00, 10.27it/s]
4/20 0.598G 0.0635 0.0375 0.989 : 100% 469/469 [00:45<00:00, 10.32it/s]
5/20 0.598G 0.0546 0.031 0.991 : 100% 469/469 [00:46<00:00, 10.19it/s]
6/20 0.598G 0.0531 0.0271 0.991 : 100% 469/469 [00:45<00:00, 10.24it/s]
7/20 0.598G 0.0482 0.0253 0.992 : 100% 469/469 [00:45<00:00, 10.31it/s]
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@rsomani95 if i understand you would want to scene classification along with object detection right ? i think by adding a multitask network for backbone we will get it . @glenn-jocher @zhiqwang extending this thought, if i have two models both trained for different classes like Model1 for car , person and Model 2 for logo, texts having the same backbone like resnet or darknet is it possible to bring them into single network so that inference time and memory consumption can we reduced , can you please share your thoughts on this which will be helpful
@abhigoku10 anything is possible if you have enough time and resources to customize solutions, i.e. you are in academia and just want to research or in an enterprise with unlimited funds. In that case you can try adding additional heads with their own tasks and losses (and data labels), i.e. classify, detect, segment, keypoint heads separately.
If you are trying to bring a product to market with the minimum risk and lead time (and cost) then you should stick to the standard use-cases, i.e. YOLO for detection, EfficientNet for classification etc. and develop your product around those rather than the other way around.
@glenn-jocher thansk for the response , but logically can sharing of backbone be done since both are detection modules only constraint is different dataset and classes ? so there would not be need of separate training right
@abhigoku10 yes it should be possible. Karpathy calls these hydra networks, from Greek mythology.
@abhigoku10 I looked at the classifier branch and saw it had a few issues that had arisen due to divergence with master. I've merged master, verified correct operation, and added an inference usage example:
git clone https://github.com/ultralytics/yolov5 -b classifier
cd yolov5
pip install -r requirements.txt
python classifier.py --model yolov5s --data mnist --epochs 5 --img 128
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v5.0-527-g76259b1 torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Training yolov5s on mnist dataset with 10 classes...
Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 v5.0-527-g76259b1 torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Fusing layers...
Model Summary: 224 layers, 7266973 parameters, 0 gradients
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
Model Summary: 101 layers, 1192362 parameters, 1192362 gradients
Image sizes 128 train, 128 test
Using 2 dataloader workers
Logging results to runs/train/exp3
Starting training for 5 epochs...
epoch gpu_mem train_loss val_loss accuracy
1/5 0.958G 0.436 0.152 0.953 : 100% 469/469 [00:35<00:00, 13.38it/s]
2/5 0.958G 0.128 0.0674 0.979 : 100% 469/469 [00:34<00:00, 13.43it/s]
3/5 0.958G 0.0908 0.0609 0.98 : 100% 469/469 [00:34<00:00, 13.52it/s]
4/5 0.958G 0.0689 0.0379 0.986 : 100% 469/469 [00:34<00:00, 13.52it/s]
5/5 0.958G 0.0499 0.0279 0.99 : 100% 469/469 [00:34<00:00, 13.44it/s]
Training complete. Results saved to runs/train/exp.
import cv2
import numpy as np
import torch
import torch.nn.functional as F
# Functions
resize = torch.nn.Upsample(size=(128, 128), mode='bilinear', align_corners=False)
normalize = lambda x, mean=0.5, std=0.25: (x - mean) / std
# Model
model = torch.load('runs/train/exp/weights/best.pt')['model'].cpu().float()
# Image
im = cv2.imread('../mnist/test/0/10.png')[::-1] # HWC, BGR to RGB
im = np.ascontiguousarray(np.asarray(im).transpose((2, 0, 1))) # HWC to CHW
im = torch.tensor(im).unsqueeze(0) / 255.0 # to Tensor, to BCWH, rescale
im = resize(normalize(im))
# Inference
results = model(im)
p = F.softmax(results, dim=1) # probabilities
print(p)
tensor([[9.99685e-01, 2.24096e-11, 6.25672e-06, 6.37622e-07, 2.57255e-09, 6.91358e-07, 3.36181e-05, 3.68758e-08, 6.94699e-06, 2.66681e-04]], grad_fn=<SoftmaxBackward>)
@glenn-jocher Is it possible to do multi-label classification only? If i understand the data structure in classifier.py correctly, the images should be placed in folder corresponding to the class they belong to? So how would a multi-label case work in that case?
@ajonand on the labelling side that's a good question, I think the dataset structure impedes multilabel as you mentioned.
On the inference side classification is always multilabel, as every label receives an output, and a softmax normalizes all of these to sum to 1, so yes you can view say the top 3 or top 10 classification labels for an image rather than just the highest likelihood class.
@glenn-jocher Hi Glenn, I tried to use the classifier inference that you provided, the result is a big tensor (the results variable in your code) which I don't understand. Also the results.print() method doesn't work unlike in the YoloV5 inference tutorial. Could you please explain what the results are? My goal was to get the class of a cropped image. Thanks a lot!
@matpy1 YOLOv5 classification models output in the same format as every other classification model, i.e. EfficientNet, ResNet, etc. These are confidence vectors of shape(batch,class), i.e. (16,100) for 16 images and 100 classes.
@matpy1 YOLOv5 classification models output in the same format as every other classification model, i.e. EfficientNet, ResNet, etc. These are confidence vectors of shape(batch,class), i.e. (16,100) for 16 images and 100 classes.
Will classify only trained model work on video using detect.py?
@vijishmadhavan no, detect.py only works for normal YOLOv5 detection models. You can see classification inference example in classifier.py Usage section: https://github.com/ultralytics/yolov5/blob/136640eee86b529b6419e6e9b4c7c008aea9b6a8/classifier.py#L8-L29
prediction is always the same class (wrong class, by the way) even when I test with the image from training set. Even though my best model got 100% accuracy.
Hi @glenn-jocher , thanks for your amazing work, I have one question, I train my dataset using classifier.py, with accuracy of 0.981, later i use this code using ncnn for classification task:
std::vector < float > softmax(const float * logits, unsigned int _size, unsigned int & max_id) {
if (_size == 0 || logits == nullptr) return {};
float max_prob = 0. f, total_exp = 0. f;
std::vector < float > softmax_probs(_size);
for (unsigned int i = 0; i < _size; ++i) {
softmax_probs[i] = std::exp((float) logits[i]);
total_exp += softmax_probs[i];
}
for (unsigned int i = 0; i < _size; ++i) {
softmax_probs[i] = softmax_probs[i] / total_exp;
if (softmax_probs[i] > max_prob) {
max_id = i;
max_prob = softmax_probs[i];
}
}
return softmax_probs;
}
std::vector < unsigned int > argsort(const std::vector < float > & arr) {
if (arr.empty()) return {};
const unsigned int _size = arr.size();
std::vector < unsigned int > indices;
for (unsigned int i = 0; i < _size; ++i) indices.push_back(i);
std::sort(indices.begin(), indices.end(),
[ & arr](const unsigned int a,
const unsigned int b) {
return arr[a] > arr[b];
});
return indices;
}
static float detect_yolo_classifier(const cv::Mat & image) {
ncnn::Net yolo_classifier;
yolo_classifier.opt.use_vulkan_compute = false;
yolo_classifier.load_param("model.param");
yolo_classifier.load_model("model.bin");
const int target_size = 352;
ncnn::Mat in = ncnn::Mat::from_pixels_resize(image.data, ncnn::Mat::PIXEL_BGR2RGB, image.cols, image.rows, target_size, target_size);
const float mean_vals[3] = {
0. f,
0. f,
0. f
};
const float norm_vals[3] = {
1.0 / 255. f,
1.0 / 255. f,
1.0 / 255. f
}; in .substract_mean_normalize(mean_vals, norm_vals);
ncnn::Extractor ex = yolo_classifier.create_extractor();
ex.input("in0", in );
ncnn::Mat out;
ex.extract("out0", out);
const unsigned int num_classes = out.w;
const float * logits = (float * ) out.data;
unsigned int max_id;
std::vector < float > scores = softmax(logits, num_classes, max_id);
std::vector < unsigned int > sorted_indices = argsort(scores);
return sorted_indices[0];
Using python original method in yolov5 repo all is ok, but using ncnn with c++ only detect one class correctly, can you check my code please? I need to use padding image like yolo detector or not?
Like this?
// yolov5/models/common.py DetectMultiBackend
const int max_stride = 64;
// letterbox pad to multiple of max_stride
int w = img_w;
int h = img_h;
float scale = 1.f;
if (w > h)
{
scale = (float)target_size / w;
w = target_size;
h = h * scale;
}
else
{
scale = (float)target_size / h;
h = target_size;
w = w * scale;
}
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);
// pad to target_size rectangle
// yolov5/utils/datasets.py letterbox
int wpad = (w + max_stride - 1) / max_stride * max_stride - w;
int hpad = (h + max_stride - 1) / max_stride * max_stride - h;
ncnn::Mat in_pad;
ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 114.f);
Im using script to export to torchscript
python3 export.py --weights '/home/miguel/Documentos/yolov5/runs/train/exp/weights/best.pt' --include torchscript --img 320
Thanks
@xellDart I'm not great at C and since we're so busy I can't individually comment on users code much, but in general you need to make sure the image is preprocessed exactly the same way (i.e. same image size, same transforms, same RGB order) in your custom script as in the official inference script here: https://github.com/ultralytics/yolov5/blob/9794f63ddfdc7599a1ed368395115f9cd3c7d50f/classifier.py#L8-L14
Also I haven't verified export of classification models using export.py, but it's possible it may work out of the box already.
Yes the https://github.com/ultralytics/yolov5/tree/classifier branch adds a classify.py file that does standalone classifier training. It does not attempt to merge tasks though.
There is a C5_divergent branch (https://github.com/ultralytics/yolov5/tree/C5_divergent) that examines some updated architectures that are similar to hydra nets. For example this model has one backbone and two heads: https://github.com/ultralytics/yolov5/blob/C5_divergent/models/hub/yolov5l6d-640.yaml
Basically the yaml files are very flexible and can allow you to define a lot of interesting shapes such as the dual-head network above, which I was using to see if there was any benefit to having different heads compute different losses (i.e. a box regression head and a obj/cls head seperately). I didn't find any benefit from doing this strangely enough, so it's possible that in the case of the detection the loss components might complement each other.
Hey @glenn-jocher, the Hydra branch doesn't exist anymore. Can you please share an example yaml for a hydra net implementation of yolov5?
Thanks
Hey @abhigoku10,
Were you successful in making a hydra net implementation of Yolov5, I am working on something similar and need help with that.
Thanks
@mic2112 nope i could not pursue hydranet implementation due to other priority activities , once done pls share it would helpful
@glenn-jocher 感谢您的响应,但是逻辑上可以共享骨干网,因为两者都是检测模块,唯一的约束是不同的数据集和类?所以不需要单独的培训权
hello, I want to do the same thing,I want needs to be trained in a Multi-Task setup where both heads share the same backbone. what should i do? thanks
Hi @glenn-jocher
Have TTA and test evaluation logging been implemented for classification? Executing classify/val.py give empty exp folders to me.
@guptasaumya no, TTA and test eval are not implemented for classification models. Currently no assets are created that would be saved to classify/val.py, but if you'd like to help by submitting a PR that would be great!
@glenn-jocher 感谢您的响应,但是逻辑上可以共享骨干网,因为两者都是检测模块,唯一的约束是不同的数据集和类?所以不需要单独的培训权
hello, I want to do the same thing,I want needs to be trained in a Multi-Task setup where both heads share the same backbone. what should i do? thanks
Hi @LeonNerd , were you able to pursue the task that you had mentioned (same backbone and 2 different heads)? Please can you kindly guide incase you were able to understand it. Thanks
Yes the https://github.com/ultralytics/yolov5/tree/classifier branch adds a classify.py file that does standalone classifier training. It does not attempt to merge tasks though.
There is a C5_divergent branch (https://github.com/ultralytics/yolov5/tree/C5_divergent) that examines some updated architectures that are similar to hydra nets. For example this model has one backbone and two heads: https://github.com/ultralytics/yolov5/blob/C5_divergent/models/hub/yolov5l6d-640.yaml
Basically the yaml files are very flexible and can allow you to define a lot of interesting shapes such as the dual-head network above, which I was using to see if there was any benefit to having different heads compute different losses (i.e. a box regression head and a obj/cls head seperately). I didn't find any benefit from doing this strangely enough, so it's possible that in the case of the detection the loss components might complement each other.
Hi @glenn-jocher , please is it possible if you can kindly share the updated links for the repositories for C5_divergent branch and the model that has one backbone but 2 heads since the above links are no longer opening? It would be really helpful. Thanks in anticipation.
Hey @abhigoku10,
Were you successful in making a hydra net implementation of Yolov5, I am working on something similar and need help with that.
Thanks
Hi @mic2112 , were you able to pursue the task that you had mentioned (same backbone and 2 different heads)? Please can you kindly guide incase you were able to understand it. Thanks
❔Question
Thanks for this great work! I am planning to perform a classification-only task on yolov5. It is a wise way to do it by setting the bbox to the whole image? Like in the label.txt (class 0.5 0.5 1 1). Or is there any other better workaround?
Additional context