Open zhiqwang opened 3 years ago
@zhiqwang Hi, library seems good, so i was thinking to contribute and make it as flexible as possible by adding support for many other backbones, losses and FPN and even add own architecture which is tweaked for performance purpose. let me know if u have a slack channel or other platform to discuss above Thanks
Hi @kartik4949
Some modular design does require more careful consideration, we are eager for your help, and join on Slack here .
Could I train the model by yolov5-rt with custom dataset?Or I need to train the model by yolov5 v4.0 then convert the weights by
from yolort.utils import update_module_state_from_ultralytics
# Update module state from ultralytics
model = update_module_state_from_ultralytics(arch='yolov5s', version='v4.0', custom_path_or_model = torch.load('path/to/model.pt'), num_classes = 1)
# Save updated module
torch.save(model.state_dict(), 'yolov5s_updated.pt')
Thanks
Hi @stereomatchingkiss , Both of these are feasible, but I recommend the second approach now.
Training with yolort
is now in the experimental phase, you can check the following for more details.
FYI I aim to release a version that supports training before 7th May, I guess that it will not train as well as ultralytics, but it will be more friendly 😄
Hi @zhiqwang , thanks for your awesome repo! Do you have any news on the training release? I started from your codebase to implement training myself. It is working fine now, i.e. i can run training steps, however i am running into one issue.
When i apply default_train_transforms in your data modules. It happends that after transforming, there are no targets left, probably because they lie outside of the crop.
Can you give me some hints how to deal best with empty targets in box_head.py? Particularily in those functions:
targets_cls, targets_box, indices, anchors = self.select_training_samples(head_outputs, targets)
losses = self.compute_loss(head_outputs, targets_cls, targets_box, indices, anchors)
Thanks a lot in adavance!
Hi @Tomakko
Thanks for your carefully debug information, I guess it is due to the poorly implementation of the data augmentation, as you mentioned, the default_train_transforms
in
will filter most targets
.
I think we should fix this augmentation to make sure there are at least one targets
left when the losses are computed in
Do you have any news on the training release?
My next plan is to learn from the realization of data augmentation in torchvision, they recently upload the augmentation methods when they are training the SSD models, we can borrow some of their codes here to make the augmentation acceptable.
Your feedback is very important to me, and feel free to file new issues about the trainer here, and let's train a good model together. 🚀
Thanks you @zhiqwang! I currently need to relalize an embedded yolo model in the short term and therefore do training with ultralytics, but afterwards i would be willing to contribute here. The training pipeline in ultralytics is just super cumbersome ;)
Hi @zhiqwang, thanks for the awesome work ! I was wondering how to load a pretrained model if the number of classes differs from the default, something like that:
from yolort.models import yolov5s
model = yolov5s(pretrained=True, score_thresh=0.45, num_classes=5)
This piece of code throws the following error due to dimension mismatch:
RuntimeError: Error(s) in loading state_dict for YOLO: size mismatch for head.head.0.weight: copying a param with shape torch.Size([255, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 128, 1, 1]). size mismatch for head.head.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]). size mismatch for head.head.1.weight: copying a param with shape torch.Size([255, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 256, 1, 1]). size mismatch for head.head.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]). size mismatch for head.head.2.weight: copying a param with shape torch.Size([255, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 512, 1, 1]). size mismatch for head.head.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]).
As we can see, only the weights & biases of head.head are mismatching, and I think that the formula to get that first dimension is (num_classes + 5) * 3
.
Is there any function/method that I'm not aware of that would allow us to match these dimensions, some method that would work like that (if integrated in the YOLO class):
def load_state_dict(self, state_dict, num_classes):
weights_to_skip = [f"head.head.{i}.weight" for i in range(3)]
bias_to_skip = [f"head.head.{i}.bias" for i in range(3)]
for weight in weights_to_skip + bias_to_skip:
state_dict[weight] = state_dict[weight][:(num_classes + 5) * 3, ...]
super().load_state_dict(state_dict)
Currently the only way I found to load a YOLO model that has a different number of classes is to use the load_from_yolov5
method which requires us to already have a checkpoint model.
Hi @denguir , Thanks for asking this questions first.
Is there any function/method that I'm not aware of that would allow us to match these dimensions.
We don't currently offer a solution to deal with this problem. But I guess you can load only the backbone parts to partially solve the problem. (I modified the snippets from https://discuss.pytorch.org/t/how-to-load-part-of-pre-trained-model/1113/3)
from yolort.models import yolov5s
from yolort.utils import load_state_dict_from_url
model = yolov5s(pretrained=False, score_thresh=0.45, num_classes=5)
checkpoint_path = "/home/user/.cache/torch/hub/checkpoints/yolov5_darknet_pan_s_r60_coco-9f44bf3f.pt"
pretrained_dict = load_state_dict_from_url(checkpoint_path)
# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if "backbone" in k}
# 2. load the filted state dict
model.model.load_state_dict(pretrained_dict, strict=False)
BTW, The training mechanism of yolort is still not well developed and any kind of contribution is welcome here.
Thanks @zhiqwang, I will definitely explore further the training process of yolort and I will try to help there
🚀 Feature
Support training models from scratch, this is a follow-up issue of #16.
Motivation
Test whether the trainer mechanism works.
Pitch