ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.93k stars 16.4k forks source link

save model weight with model.state_dict(),inference angin, Can't get the right results #5148

Closed Nr-rN closed 2 years ago

Nr-rN commented 3 years ago

❔Question

In train.py line 373:

ckpt = {'epoch': epoch, 'best_fitness': best_fitness, 'model': deepcopy(de_parallel(model)).half(), 'ema': deepcopy(ema.ema).half(), 'updates': ema.updates, 'optimizer': optimizer.state_dict(), 'wandb_id': loggers.wandb.wandb_run.id if loggers.wandb else None} # Save last, best and delete torch.save(ckpt, last)

change: ckpt = {'epoch': epoch, 'best_fitness': best_fitness, 'model': model.state_dict(), 'ema': deepcopy(ema.ema).half(), 'updates': ema.updates, 'optimizer': optimizer.state_dict(), 'wandb_id': loggers.wandb.wandb_run.id if loggers.wandb else None} # Save last, best and delete torch.save(ckpt, last) `

Use the code to inference(detect): from models.yolo import Model ...... #loading model model = Model().to(device).eval() ckpt = torch.load('xxxxxx.pt') model.load_state_dict(ckpt['model'], strict=False) .... pred = model(img)

**I can't get rigtht results 'pred' by dectct.py, I don't know where the problem is, I guess:

  1. I don't have all the parameters needed to save the model,only state_dict can't be reasoned inference
  2. load model for error rule**
github-actions[bot] commented 3 years ago

πŸ‘‹ Hello @Nr-rN, thank you for your interest in YOLOv5 πŸš€! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 3 years ago

You don't need to modify any code to use detect.py, you just train a model and then use it directly:

python train.py
python detect.py --weights runs/train/exp/weights/best.pt
Nr-rN commented 3 years ago

@glenn-jocher Thank you for answering my question, but I failed to express my question clearly. i want use ckpt = {.... ,'model':model.state_dict(), .....} to save model and load by model = Model() model.load_state_dict(torch.load(ckpt))

because in detect.py line 82 attempt_load(weights, map_location=device) if save model with ckpt = {.... ,'model': deepcopy(de_parallel(model)).half(), .....} weight will contain the serialized data is bound to the specific classes and the exact directory structure used

I get Erroe:

File "/opt/det.py", line 3, in model = torch.load('./weights/best.pt',map_location='cpu') File "/home/tian/.conda/envs/jiqi/lib/python3.6/site-packages/torch/serialization.py", line 368, in load return _load(f, map_location, pickle_module) File "/home/tian/.conda/envs/jiqi/lib/python3.6/site-packages/torch/serialization.py", line 542, in _load result = unpickler.load() ModuleNotFoundError: No module named 'models'

When I see the same problem: https://github.com/pytorch/pytorch/issues/18325 i user torch.jit.trace(model,torch.tensor(torch.rand(size=(1,3,640,640))) but it not work ERROR with:

TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%self.1 : torch.models.yolo.Model, %x : Tensor): %2 : torch.torch.nn.modules.container.Sequential = prim::GetAttrname="model" %3 : torch.models.yolo.Detect = prim::GetAttrname="24" %4 : torch.torch.nn.modules.container.Sequential = prim::GetAttrname="model" %5 : torch.models.common.C3 = prim::GetAttrname="23" ... ... ... Comparison exception: expected tensor shape torch.Size([1, 1, 40, 40, 2]) doesn't match with actual tensor shape torch.Size([])!

now,i use this method ,like:

m = MyModule() m.state_dict() OrderedDict([('l0.weight', tensor([[ 0.1400, 0.4563, -0.0271, -0.4406], [-0.3289, 0.2827, 0.4588, 0.2031]])), ('l0.bias', tensor([ 0.0300, -0.1316])), ('l1.weight', tensor([[0.6533, 0.3413]])), ('l1.bias', tensor([-0.1112]))])

torch.save(m.state_dict(), 'mymodule.pt') m_state_dict = torch.load('mymodule.pt') new_m = MyModule() new_m.load_state_dict(m_state_dict)

When I use the new loading method for reasoning, the result is worse than the original loading method

glenn-jocher commented 3 years ago

@Nr-rN we don't assist in debugging custom code. Inference examples are shown for detect.py and PyTorch Hub in the README.

If you find a reproducible bug in the unmodified repository code please raise a separate bug report.

github-actions[bot] commented 3 years ago

πŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 πŸš€ resources:

Access additional Ultralytics ⚑ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐!

zcunyi commented 3 years ago

@glenn-jocher Thank you for answering my question, but I failed to express my question clearly. i want use ckpt = {.... ,'model':model.state_dict(), .....} to save model and load by model = Model() model.load_state_dict(torch.load(ckpt))

because in detect.py line 82 attempt_load(weights, map_location=device) if save model with ckpt = {.... ,'model': deepcopy(de_parallel(model)).half(), .....} weight will contain the serialized data is bound to the specific classes and the exact directory structure used

I get Erroe:

File "/opt/det.py", line 3, in model = torch.load('./weights/best.pt',map_location='cpu') File "/home/tian/.conda/envs/jiqi/lib/python3.6/site-packages/torch/serialization.py", line 368, in load return _load(f, map_location, pickle_module) File "/home/tian/.conda/envs/jiqi/lib/python3.6/site-packages/torch/serialization.py", line 542, in _load result = unpickler.load() ModuleNotFoundError: No module named 'models'

When I see the same problem: https://github.com/pytorch/pytorch/issues/18325 i user torch.jit.trace(model,torch.tensor(torch.rand(size=(1,3,640,640))) but it not work ERROR with:

TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%self.1 : torch.models.yolo.Model, %x : Tensor): %2 : torch.torch.nn.modules.container.Sequential = prim::GetAttrname="model" %3 : torch.models.yolo.Detect = prim::GetAttrname="24" %4 : torch.torch.nn.modules.container.Sequential = prim::GetAttrname="model" %5 : torch.models.common.C3 = prim::GetAttrname="23" ... ... ... Comparison exception: expected tensor shape torch.Size([1, 1, 40, 40, 2]) doesn't match with actual tensor shape torch.Size([])!

now,i use this method ,like:

m = MyModule() m.state_dict() OrderedDict([('l0.weight', tensor([[ 0.1400, 0.4563, -0.0271, -0.4406], [-0.3289, 0.2827, 0.4588, 0.2031]])), ('l0.bias', tensor([ 0.0300, -0.1316])), ('l1.weight', tensor([[0.6533, 0.3413]])), ('l1.bias', tensor([-0.1112]))])

torch.save(m.state_dict(), 'mymodule.pt') m_state_dict = torch.load('mymodule.pt') new_m = MyModule() new_m.load_state_dict(m_state_dict)

When I use the new loading method for reasoning, the result is worse than the original loading method

hello, how about your problem? solve it?

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 πŸš€ resources:

Access additional Ultralytics ⚑ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐!

zhoujian-z commented 2 years ago

@zcunyi how about your problem? solve it?