Closed gazzadi closed 1 year ago
π Hello @gazzadi, thank you for your interest in YOLOv5 π! Please visit our βοΈ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a π Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training β Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 π!
Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.
Check out our YOLOv8 Docs for details and get started with:
pip install ultralytics
@gazzadi thank you for reaching out with your question.
The error you encountered occurs because the merge.pt
file you saved does not contain the necessary information to load the model. When you save the state dictionary using torch.save()
, it only saves the weights and not the entire model architecture.
To correctly merge the weights from two models, you need to create a new model and load the state dictionaries into it. Here's an updated code snippet:
import torch
from models.common import Detect
modelA = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/user.pt')
sdA = modelA.state_dict()
modelB = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/central.pt')
sdB = modelB.state_dict()
sdC = {}
for key in sdA:
sdC[key] = (sdA[key] + sdB[key]) / 2
modelC = Detect()
modelC.load_state_dict(sdC)
torch.save(modelC.state_dict(), "./models/merge.pt")
merged_model = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/merge.pt')
In the updated code, we create a new model modelC
with the same architecture as the original models but without loading any weights. We then load the merged state dictionary sdC
into modelC
. Finally, we load the merged model using torch.hub.load()
by providing the path to the merge.pt
file.
This should resolve the error and allow you to load the merged model successfully.
Let me know if you have any further questions!
Thank for your answer, i seem to have problem with this solution either.
The import doesn't find Detect
in the common
file.
The code that i've run its the same you wrote me but it return this error:
Traceback (most recent call last):
File "E:\TirocinioVero\text_and_drive\Terza_Fase\Progetto\UserA\yolov5\prova.py", line 2, in <module>
from models.common import Detect
ImportError: cannot import name 'Detect' from 'models.common' (E:\TirocinioVero\text_and_drive\Terza_Fase\Progetto\UserA\yolov5\models\common.py)
I haven't work or modified the yolo files so i don't know what is the problem. I read the common.py
file and i haven't found the definition of Detect
in it.
The only similar classes i've found are in the yolo.py
file, vut they don't work either.
@gazzadi the Detect
class is not a part of the original YOLOv5 codebase. It seems that there might have been some confusion or misunderstanding in the previous solution provided.
To correctly merge the weights from two YOLOv5 models, you can follow the steps below:
import torch
from torch import nn
# Load the model state dictionaries
modelA = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/user.pt')
sdA = modelA.state_dict()
modelB = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/central.pt')
sdB = modelB.state_dict()
# Merge the state dictionaries
sdC = {}
for key in sdA:
sdC[key] = (sdA[key] + sdB[key]) / 2
# Create a new model with the merged weights
modelC = torch.hub.load('ultralytics/yolov5', 'custom')
modelC.load_state_dict(sdC)
# Save the merged model
torch.save(modelC.state_dict(), "./models/merge.pt")
# Load the merged model
merged_model = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/merge.pt')
In this updated code, we create a new model (modelC
) using torch.hub.load('ultralytics/yolov5', 'custom')
without loading any weights. We then merge the state dictionaries sdA
and sdB
by averaging their values. Finally, we load the merged model using torch.hub.load()
and the path to the saved merge.pt
file.
This should resolve the issue you're facing and allow you to merge the weights of the two YOLOv5 models successfully.
Let me know if you have any further questions!
I've tried multiple times the code that you suggested but it's not working like we're hoping.
This command doesn't work write in this format.
modelC = torch.hub.load('ultralytics/yolov5', 'custom')
For retrieving the default yolov5s model i found that i can write
modelC = torch.hub.load('ultralytics/yolov5', 'yolov5s')
But there's another problem that i'm facing, the model just retrieved and the ones that i have trained differ from numbers of layers. I specipy that modelA and modelB were trained starting with the base weights yolov5s.pt
.
I don't know if there is a method to train that non alterate the layers of the base model.
Next i show the code that i have used for the merge:
import torch
from torch import nn
# Load the model state dictionaries
modelA = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/user.pt')
sdA = modelA.state_dict()
modelB = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/central.pt')
sdB = modelB.state_dict()
# Merge the state dictionaries
sdC = {}
for key in sdA:
sdC[key] = (sdA[key] + sdB[key]) / 2
# Create a new model with the merged weights
modelC = torch.hub.load('ultralytics/yolov5', 'yolov5s')
modelC.load_state_dict(sdC)
# Save the merged model
torch.save(modelC.state_dict(), "./models/merge.pt")
# Load the merged model
merged_model = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/merge.pt')
And here is the traceback that came out:
Using cache found in C:\Users\Davide/.cache\torch\hub\ultralytics_yolov5_master
YOLOv5 2023-8-16 Python-3.10.0 torch-2.0.1+cu117 CUDA:0 (NVIDIA GeForce GTX 1050 Ti, 4096MiB)
Fusing layers...
Model summary: 157 layers, 7020913 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape...
Using cache found in C:\Users\Davide/.cache\torch\hub\ultralytics_yolov5_master
YOLOv5 2023-8-16 Python-3.10.0 torch-2.0.1+cu117 CUDA:0 (NVIDIA GeForce GTX 1050 Ti, 4096MiB)
Fusing layers...
Model summary: 157 layers, 7020913 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape...
Using cache found in C:\Users\Davide/.cache\torch\hub\ultralytics_yolov5_master
YOLOv5 2023-8-16 Python-3.10.0 torch-2.0.1+cu117 CUDA:0 (NVIDIA GeForce GTX 1050 Ti, 4096MiB)
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape...
Traceback (most recent call last):
File "E:\TirocinioVero\text_and_drive\Terza_Fase\Progetto\UserA\prova.py", line 18, in <module>
modelC.load_state_dict(sdC)
File "C:\Users\Davide\.virtualenvs\Progetto-ca1l6rVE\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AutoShape:
size mismatch for model.model.model.24.m.0.weight: copying a param with shape torch.Size([27, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([255, 128, 1, 1]).
size mismatch for model.model.model.24.m.0.bias: copying a param with shape torch.Size([27]) from checkpoint, the shape in current model is torch.Size([255]).
size mismatch for model.model.model.24.m.1.weight: copying a param with shape torch.Size([27, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([255, 256, 1, 1]).
size mismatch for model.model.model.24.m.1.bias: copying a param with shape torch.Size([27]) from checkpoint, the shape in current model is torch.Size([255]).
size mismatch for model.model.model.24.m.2.weight: copying a param with shape torch.Size([27, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([255, 512, 1, 1]).
size mismatch for model.model.model.24.m.2.bias: copying a param with shape torch.Size([27]) from checkpoint, the shape in current model is torch.Size([255]).
For knowledge, i also put here the command that i use for training modelA and modelB:
python .\yolov5\train.py --epochs 150 --batch-size -1 --img 384 --data completo384.yaml --weights yolov5s.pt
@gazzadi i apologize for the confusion. It seems that the torch.hub.load('ultralytics/yolov5', 'custom')
method might not work for loading a YOLOv5 model without weights. Thank you for finding an alternative solution using torch.hub.load('ultralytics/yolov5', 'yolov5s')
to load the default yolov5s
model.
Regarding the mismatched number of layers between the models, this can happen if the structure of the models (e.g., the number of layers) has been modified during training. The merged model should have the same structure as the base model you used for training (yolov5s.pt
). Make sure that the models you are merging were trained with the same base model and haven't undergone any modifications.
Regarding the error message you received (size mismatch for ...
), it suggests that the size of the layers in the merged state dictionary sdC
does not match the size of the equivalent layers in the yolov5s
model. This could happen if the models being merged have different architecture configurations.
To address this issue, you may need to modify the merging process to handle the differences in layer sizes between the models. This could involve resizing or reshaping the weights appropriately.
Please note that modifying the YOLOv5 codebase, such as altering layer sizes, can lead to unexpected behavior or loss of accuracy. It's recommended to use the same base model architecture and weights for training and merging to ensure compatibility.
If the issue persists, please provide more information about the exact steps and configurations you used for training modelA
and modelB
, including any modifications or differences from the base yolov5s
model.
Thank you for the information.
I haven't done some changes on the base model when i created modelA and modelB. The two, in fact, are the same model created in two different directory but with the same version of yolov5s as basic weights. I haven't modified or alterated the file when i downloaded Yolo and i've only used the weights to train my model. The only change that i can think of is the number of classes but i thougth that was not correlated with the layers of the model.
I've tried to replicate the setting in a new environment and this are all the steps and passage that i have done:
Here the steps to replicate the environment:
Downloaded, from the latest version "v7", the model "yolov5s.pt" and the source code for the requirments
Create in a new directory a virtual environment with pipenv
Installed in the virtual environment the requirements necessary for the correct function of the model, with the command pipenv install --requirements ./yolov5-7.0
I've installed the correct version of pytorch for working with cuda, with the command pipenv install torch torchvision torchaudio --index https://download.pytorch.org/whl/cu117
Training of ModelA python .\yolov5-7.0\train.py --epochs 150 --batch-size -1 --img 384 --data modelA.yaml --name modelA --weights yolov5s.pt --project train
Training of ModelB python .\yolov5-7.0\train.py --epochs 150 --batch-size -1 --img 384 --data modelB.yaml --name modelB --weights yolov5s.pt --project train
At the start of the training i've received this warning
C:\Users\Davide\.virtualenvs\EnvTest-85QvB3Ib\lib\site-packages\seaborn\_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
if pd.api.types.is_categorical_dtype(vector):
C:\Users\Davide\.virtualenvs\EnvTest-85QvB3Ib\lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
And this Traceback at the end
Traceback (most recent call last):
File "C:\Users\Davide\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1009, in _bootstrap_inner
Exception in thread Thread-18 (plot_images):
Traceback (most recent call last):
File "C:\Users\Davide\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1009, in _bootstrap_inner
self.run()
File "C:\Users\Davide\AppData\Local\Programs\Python\Python310\lib\threading.py", line 946, in run
self.run()
File "C:\Users\Davide\AppData\Local\Programs\Python\Python310\lib\threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "E:\TirocinioVero\text_and_drive\Terza_Fase\EnvTest\yolov5-7.0\utils\plots.py", line 305, in plot_images
File "E:\TirocinioVero\text_and_drive\Terza_Fase\EnvTest\yolov5-7.0\utils\plots.py", line 305, in plot_images
annotator.box_label(box, label, color=color)
File "E:\TirocinioVero\text_and_drive\Terza_Fase\EnvTest\yolov5-7.0\utils\plots.py", line 91, in box_label
annotator.box_label(box, label, color=color)
File "E:\TirocinioVero\text_and_drive\Terza_Fase\EnvTest\yolov5-7.0\utils\plots.py", line 91, in box_label
w, h = self.font.getsize(label) # text width, height
AttributeError: 'FreeTypeFont' object has no attribute 'getsize'
w, h = self.font.getsize(label) # text width, height
AttributeError: 'FreeTypeFont' object has no attribute 'getsize'
But it seems the training ad gone well.
I've started the merging program
merge.py
import torch
from torch import nn
modelA = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/modelA.pt') sdA = modelA.state_dict()
modelB = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/modelA.pt') sdB = modelB.state_dict()
sdC = {} for key in sdA: sdC[key] = (sdA[key] + sdB[key]) / 2
modelC = torch.hub.load('ultralytics/yolov5', 'custom', "./yolov5s.pt") modelC.load_state_dict(sdC)
torch.save(modelC.state_dict(), "./models/merge.pt")
merged_model = torch.hub.load('ultralytics/yolov5', 'custom', path='./models/merge.pt')
Traceback
Using cache found in C:\Users\Davide/.cache\torch\hub\ultralytics_yolov5_master WARNING invalid check_version(5.9.5, ) requested, please check values. YOLOv5 2023-8-16 Python-3.10.0 torch-2.0.1+cu117 CUDA:0 (NVIDIA GeForce GTX 1050 Ti, 4096MiB)
Fusing layers... Model summary: 157 layers, 7020913 parameters, 0 gradients, 15.8 GFLOPs Adding AutoShape... Using cache found in C:\Users\Davide/.cache\torch\hub\ultralytics_yolov5_master WARNING invalid check_version(5.9.5, ) requested, please check values. YOLOv5 2023-8-16 Python-3.10.0 torch-2.0.1+cu117 CUDA:0 (NVIDIA GeForce GTX 1050 Ti, 4096MiB)
Fusing layers... Model summary: 157 layers, 7020913 parameters, 0 gradients, 15.8 GFLOPs Adding AutoShape... Using cache found in C:\Users\Davide/.cache\torch\hub\ultralytics_yolov5_master WARNING invalid check_version(5.9.5, ) requested, please check values. YOLOv5 2023-8-16 Python-3.10.0 torch-2.0.1+cu117 CUDA:0 (NVIDIA GeForce GTX 1050 Ti, 4096MiB)
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape...
Traceback (most recent call last):
File "E:\TirocinioVero\text_and_drive\Terza_Fase\EnvTest\merge.py", line 18, in
The error is the same, but i don't know why the layer of my trained models have changed.
Sorry for the long comment but i wanted to explain better the whole situation, and thanks for the time @glenn-jocher already put in this thread.
@gazzadi thank you for providing detailed information about your issue.
Based on the information you provided, it seems that you have trained two models, ModelA and ModelB, using the same base model (yolov5s) but with different directories and the same version of yolov5s as basic weights. You mentioned that the only change you made was the number of classes.
To replicate your environment, you followed these steps:
python .\yolov5-7.0\train.py --epochs 150 --batch-size -1 --img 384 --data modelA.yaml --name modelA --weights yolov5s.pt --project train
.python .\yolov5-7.0\train.py --epochs 150 --batch-size -1 --img 384 --data modelB.yaml --name modelB --weights yolov5s.pt --project train
.Based on the error message you received during the merging process, it seems that there are size mismatches between the model layers of ModelA and ModelB, and the layers of the base model. This can happen if the number of classes or the structure of the model has changed.
To further investigate the issue and provide a solution, it would be helpful to have access to the specific files and code you used for training and merging the models. Additionally, it would be useful to know which version of YOLOv5 you are using.
Please provide these details, and I'll be happy to assist you further in resolving the issue.
Thank you, i hope i'm providing the correct informations.
Version: i'm using yolov5s, from version7 of YOLO.
I've done the training directly from the console, and i haven't modified the files in the source code.
The file i've worked with are those and no other to create this example.
@gazzadi thank you for providing the additional information.
Based on your explanation, it seems that you trained two models, ModelA and ModelB, using the yolov5s base model. The training was done directly from the console without modifying any files in the source code. The files you used for this training example are the ones you mentioned and no others.
To better understand the issue you are experiencing, it would be helpful to have access to the specific files and code you used for training and merging the models. Furthermore, please confirm that you are using version 7 of YOLOv5 and yolov5s as the base model.
With this additional information, I will be able to assist you further in troubleshooting and resolving the issue.
π Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO π and Vision AI β
Search before asking
Question
Hi, i'm working on a project that train two diffent models on two different clients (A and B). The models have se same classes and structures. Every n time A share the weights from his model with the B. When B receive A's weights he want to merge them in one new model with a simple average of the two.
This is the code that create the merge and it seems to work fine, it create a new state_dict with ne average from the previous two.
The code work fine until the last
torch.hub.load()
when i try to load the model just created from the merge. The traceback return is the following:I've tried to use the
torch.load
but it doesen't work when loading A and B. I've analysed the weights file and i noticed that the beginning of A and B differ from the one i saved withtorch.save()
. I think the problem is in what object i save but online i've seen other using this method to create a new average weights file.Additional
No response