ogoun commented 4 years ago

Hello, @phanxuanduc1996, i successfully convert my darknet weights to pytorch format (YoloV3), but when i'm trying convert to onnx, i've got error: AttributeError: 'dict' object has no attribute 'state_dict'

In torch init.py method: def _unique_state_dict(module, keep_vars=False):

since Parameter.data always creates a new torch.Tensor instance,

# id(v) doesn't work with it. So we always get the Parameter or Buffer
# as values, and deduplicate the params using Parameters and Buffers
state_dict = module.state_dict(keep_vars=True)

What version of torch you using? I'm trying 1.4 and 1.5. OS: Windows 10 (build 2004). Python 3.8

phanxuanduc1996 commented 4 years ago

Hi @ogoun , I'm using torch==1.13.1 and onnx==1.15.0 In file https://github.com/phanxuanduc1996/convert_yolo_weights/blob/master/onnx_convert/pytorch_to_onnx/models.py Please try to change ONNX_EXPORT = True. (line 7). Thanks.

ogoun commented 4 years ago

I install clean Ubunta 18.4 and your packages version. I've got trained darknet weights. I checked inference, works. Then, I trying next variants for conversion.

Preparation

Make folder 'weights' in directory /convert_yolo_weights/pytorch_convert
Copy file yolov3-spp_final.weights to /convert_yolo_weights/pytorch_convert/weights
Make folder 'cfg' in directory /convert_yolo_weights/pytorch_convert
Copy file yolov3-spp.cfg to /convert_yolo_weights/pytorch_convert/cfg

1-st. As-Is

No changes in files, exept pathes to weight and config. And change input size to 608 File '_pytorch_convert/convert_weightspytorch.py'

OUTPUT

Model Summary: 225 layers, 6.25733e+07 parameters, 6.25733e+07 gradients
Success: converted 'weights/yolov3-spp_final.weights' to 'converted.pt'
Backend GTK3Agg is interactive backend. Turning interactive mode on.

File '_onnx_convert/pytorch_to_onnx/convert_pytorch_toonnx.py' Change input size to 608

OUTPUT AttributeError: 'dict' object has no attribute 'state_dict'

After change code to:

trained_model = Darknet('cfg/yolov3-spp.cfg')
trained_model.load_state_dict(torch.load('weights/converted.pt')['model'])
dummy_input = torch.randn(1, 3, 608, 608)
torch.onnx.export(trained_model, dummy_input, 'yolov3_pt.onnx')

Got ONNX file, and OUTPUT:

Model Summary: 225 layers, 6.25733e+07 parameters, 6.25733e+07 gradients
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/utils/layers.py:38: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if nx == na:  # same shape
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:207: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (self.nx, self.ny) != (nx, ny):
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:168: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  self.ng = torch.Tensor(ng).to(device)
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:168: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  self.ng = torch.Tensor(ng).to(device)
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:235: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid  # xy
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:237: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  self.anchor_wh  # wh yolo method
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:238: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator mul_. This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  io[..., :4] *= self.stride
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:238: TracerWarning: There are 4 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  io[..., :4] *= self.stride
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:239: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator sigmoid_. This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  torch.sigmoid_(io[..., 4:])
/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py:198: UserWarning: You are trying to export the model with onnx:Upsample for ONNX opset version 9. This operator might cause results to not match the expected results by PyTorch.
ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
We recommend using opset 11 and above for models using this operator. 
  "" + str(_export_onnx_opset_version) + ". "

Process finished with exit code 0

ONNX model has 4 outputs.

2-nd. ONNX_EXPORT = True

File '_pytorch_convert/convert_weightspytorch.py' ONNX_EXPORT = True

OUTPUT Success: converted 'weights/yolov3-spp_final.weights' to 'converted.pt'

pytorch_to_onnx File '_onnx_convert/pytorch_toonnx/models.py' ONNX_EXPORT = False

OUTPUT

Model Summary: 225 layers, 6.25733e+07 parameters, 6.25733e+07 gradients
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/utils/layers.py:38: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if nx == na:  # same shape
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:206: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (self.nx, self.ny) != (nx, ny):
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:167: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  self.ng = torch.Tensor(ng).to(device)
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:167: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  self.ng = torch.Tensor(ng).to(device)
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:234: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid  # xy
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:236: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  self.anchor_wh  # wh yolo method
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:237: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator mul_. This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  io[..., :4] *= self.stride
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:237: TracerWarning: There are 4 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  io[..., :4] *= self.stride
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py:238: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator sigmoid_. This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  torch.sigmoid_(io[..., 4:])
/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py:198: UserWarning: You are trying to export the model with onnx:Upsample for ONNX opset version 9. This operator might cause results to not match the expected results by PyTorch.
ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
We recommend using opset 11 and above for models using this operator. 
  "" + str(_export_onnx_opset_version) + ". "

Process finished with exit code 0

The same ONNX file. (4 outputs)

File '_onnx_convert/pytorch_toonnx/models.py' ONNX_EXPORT = True

Fail with OUTPUT

Model Summary: 225 layers, 6.25733e+07 parameters, 6.25733e+07 gradients
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/utils/layers.py:38: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if nx == na:  # same shape
Traceback (most recent call last):
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 382, in _export
    fixed_batch_size=fixed_batch_size)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 249, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 206, in _trace_and_get_graph_from_model
    trace, torch_out, inputs_states = torch.jit.get_trace_graph(model, args, _force_outplace=True, _return_inputs_states=True)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 275, in get_trace_graph
    return LegacyTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 352, in forward
    out = self.inner(*trace_inputs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 525, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py", line 264, in forward
    return self.forward_once(x)
  File "/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py", line 322, in forward_once
    yolo_out.append(module(x, img_size, out))
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 525, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py", line 210, in forward
    p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(
RuntimeError: shape '[1, 3, 6, 13, 13]' is invalid for input of size 6498
Backend GTK3Agg is interactive backend. Turning interactive mode on.

3-rd

File '_pytorch_convert/convert_weightspytorch.py' ONNX_EXPORT = False

OUTPUT

File '_pytorch_convert/convert_weights_pytorch.py_'
ONNX_EXPORT = True

pytorch_to_onnx File '_onnx_convert/pytorch_toonnx/models.py' ONNX_EXPORT = True

OUTPUT

Model Summary: 225 layers, 6.25733e+07 parameters, 6.25733e+07 gradients
/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/utils/layers.py:38: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if nx == na:  # same shape
Traceback (most recent call last):
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 382, in _export
    fixed_batch_size=fixed_batch_size)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 249, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 206, in _trace_and_get_graph_from_model
    trace, torch_out, inputs_states = torch.jit.get_trace_graph(model, args, _force_outplace=True, _return_inputs_states=True)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 275, in get_trace_graph
    return LegacyTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 352, in forward
    out = self.inner(*trace_inputs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 525, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py", line 264, in forward
    return self.forward_once(x)
  File "/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py", line 322, in forward_once
    yolo_out.append(module(x, img_size, out))
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/home/ogoun/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 525, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/ogoun/Git/convert_yolo_weights/onnx_convert/pytorch_to_onnx/models.py", line 210, in forward
    p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(
RuntimeError: shape '[1, 3, 6, 13, 13]' is invalid for input of size 6498
Backend GTK3Agg is interactive backend. Turning interactive mode on.

4-th ONNX_EXPORT in pytorch convert file

Extend file '_pytorch_convert/convert_weightspytorch.py' with ONNX_EXPORT = True

def convert(cfg='cfg/yolov3-spp.cfg', weights='weights/yolov3-spp_final.weights'):
...
    torch.save(chkpt, 'converted.pt')
    print("Success: converted '%s' to 'converted.pt'" % weights)
    onnx_model = Darknet(cfg)
    onnx_model.load_state_dict(chkpt['model'])
    dummy_input = torch.randn(1, 3, 608, 608)
    torch.onnx.export(onnx_model, dummy_input, 'yolov3_pt.onnx')
    print("Success: converted '%s' to onnx" % weights)
...

OUTPUT

Success: converted 'weights/yolov3-spp_final.weights' to 'converted.pt'
/home/ogoun/Git/convert_yolo_weights/pytorch_convert/utils/layers.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if nx == na:  # same shape
/home/ogoun/.local/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py:198: UserWarning: You are trying to export the model with onnx:Upsample for ONNX opset version 9. This operator might cause results to not match the expected results by PyTorch.
ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
We recommend using opset 11 and above for models using this operator. 
  "" + str(_export_onnx_opset_version) + ". "
Success: converted 'weights/yolov3-spp_final.weights' to onnx
Backend GTK3Agg is interactive backend. Turning interactive mode on.
import sys; print('Python %s on %s' % (sys.version, sys.platform))
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux

In last case ONNX model has two outputs. Scores and Boxes. BUT in darknet model i've got 16 predictions with score more than 0.4, and in ONNX model i've got only two predictions where score more than 0.4. And when i.m trying got boxes:

i - score index box { x = boxes[0, i 4]; y = boxes[1, i 4]; w = boxes[2, i 4]; h = boxes[3, i 4]; } i got incorrect boundings.

Can you explain me where i'm wrong?

phanxuanduc1996 commented 4 years ago

Hi @ogoun , If you are having an error, please send the model to me via private message. Alternatively, you can try converting from .weights -> .keras -> .onnx .

ogoun commented 4 years ago

Hi @phanxuanduc1996, thats my weights and config. https://yadi.sk/d/zvCt994BK-UCuw I trying find working mechanism for pipeline: train darknet weights -> convert-> onnx inference

phanxuanduc1996 commented 4 years ago

Hi @ogoun , Please send me the file containing the list of objects you want to detect.

ogoun commented 4 years ago

Hi @ogoun , Please send me the file containing the list of objects you want to detect.

@phanxuanduc1996 , i've only one class https://yadi.sk/d/f30B7iYhJ3qxnw

phanxuanduc1996 / convert_yolo_weights

Convert yolov3 to onnx #4

since Parameter.data always creates a new torch.Tensor instance,

Preparation

1-st. As-Is

2-nd. ONNX_EXPORT = True

3-rd

4-th ONNX_EXPORT in pytorch convert file