Closed edurenye closed 4 years ago
Hello @edurenye, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@edurenye thanks for the comment. There is not specific requirement for int64 anywhere in yolov5, so I don't know exactly where these variables are originating or why their datatype is as such. If you debug this and find a solution then please let us know.
A hint I have is that it probably has nothing to do with ONNX, it's probably a pytorch variable in int64 for some reason that's converted to the same in ONNX.
Thanks @glenn-jocher. My guess is that comes from this issue https://github.com/pytorch/pytorch/issues/7870, we have to find where is using the LongTensor
and force it to int32 using dtype=torch.int
as specified in that thread.
Hi @glenn-jocher I tried to add the dtype=torch.int
without luck, see https://github.com/ultralytics/yolov5/compare/master...edurenye:remove_int64
I still get:
Model Summary: 140 layers, 7.26e+06 parameters, 6.61683e+06 gradients
graph torch-jit-export (
%images[FLOAT, 1x3x416x416]
) initializers (
%483[INT64, 1]
%484[INT64, 1]
%485[INT64, 1]
%486[INT64, 1]
%487[INT64, 1]
%488[INT64, 1]
%model.0.conv.conv.bias[FLOAT, 32]
...
As well as #225 I wonder from where those parameters come from.
Hi @glenn-jocher Our issue is the same as here https://github.com/pytorch/pytorch/issues/16218
So I used this code:
layer_names = list()
for name, param_tensor in model.state_dict().items():
if param_tensor.dtype == torch.int64:
print('hola 1')
new_param = param_tensor.int()
print('hola 2')
rsetattr(model, name, new_param)
layer_names.append(name)
print(layer_names)
To find the Tensors that are INT64 and transform them to INT32 in the export.py
file.
And It got me the following list of Tensors:
['model.0.conv.bn.num_batches_tracked', 'model.1.bn.num_batches_tracked', 'model.2.cv1.bn.num_batches_tracked',
'model.2.cv4.bn.num_batches_tracked', 'model.2.bn.num_batches_tracked', 'model.2.m.0.cv1.bn.num_batches_tracked',
'model.2.m.0.cv2.bn.num_batches_tracked', 'model.3.bn.num_batches_tracked', 'model.4.cv1.bn.num_batches_tracked',
'model.4.cv4.bn.num_batches_tracked', 'model.4.bn.num_batches_tracked', 'model.4.m.0.cv1.bn.num_batches_tracked',
'model.4.m.0.cv2.bn.num_batches_tracked', 'model.4.m.1.cv1.bn.num_batches_tracked',
'model.4.m.1.cv2.bn.num_batches_tracked', 'model.4.m.2.cv1.bn.num_batches_tracked',
'model.4.m.2.cv2.bn.num_batches_tracked', 'model.5.bn.num_batches_tracked', 'model.6.cv1.bn.num_batches_tracked',
'model.6.cv4.bn.num_batches_tracked', 'model.6.bn.num_batches_tracked', 'model.6.m.0.cv1.bn.num_batches_tracked',
'model.6.m.0.cv2.bn.num_batches_tracked', 'model.6.m.1.cv1.bn.num_batches_tracked',
'model.6.m.1.cv2.bn.num_batches_tracked', 'model.6.m.2.cv1.bn.num_batches_tracked',
'model.6.m.2.cv2.bn.num_batches_tracked', 'model.7.bn.num_batches_tracked', 'model.8.cv1.bn.num_batches_tracked',
'model.8.cv2.bn.num_batches_tracked', 'model.9.cv1.bn.num_batches_tracked', 'model.9.cv4.bn.num_batches_tracked',
'model.9.bn.num_batches_tracked', 'model.9.m.0.cv1.bn.num_batches_tracked', 'model.9.m.0.cv2.bn.num_batches_tracked',
'model.10.bn.num_batches_tracked', 'model.13.cv1.bn.num_batches_tracked', 'model.13.cv4.bn.num_batches_tracked',
'model.13.bn.num_batches_tracked', 'model.13.m.0.cv1.bn.num_batches_tracked',
'model.13.m.0.cv2.bn.num_batches_tracked', 'model.14.bn.num_batches_tracked', 'model.17.cv1.bn.num_batches_tracked',
'model.17.cv4.bn.num_batches_tracked', 'model.17.bn.num_batches_tracked', 'model.17.m.0.cv1.bn.num_batches_tracked',
'model.17.m.0.cv2.bn.num_batches_tracked', 'model.19.bn.num_batches_tracked', 'model.21.cv1.bn.num_batches_tracked',
'model.21.cv4.bn.num_batches_tracked', 'model.21.bn.num_batches_tracked', 'model.21.m.0.cv1.bn.num_batches_tracked',
'model.21.m.0.cv2.bn.num_batches_tracked', 'model.23.bn.num_batches_tracked', 'model.25.cv1.bn.num_batches_tracked',
'model.25.cv4.bn.num_batches_tracked', 'model.25.bn.num_batches_tracked', 'model.25.m.0.cv1.bn.num_batches_tracked',
'model.25.m.0.cv2.bn.num_batches_tracked']
As you can see, the problem is num_batches_tracked
as well as in this issue: https://github.com/pytorch/pytorch/issues/16218
Looks like the transformation worked, because afterwards I checked the Tensors again and there where no INT64 Tensors, but when I did the export to ONNX I got the following error:
ONNX export failed.
Is there a way to have a more verbose output?
I found that I had something later that was breaking the code, I fixed it and now it 'works'. Well, it exports it as ONNX, but I still get the same output, like if I did not transformation:
Fusing layers...
Model Summary: 140 layers, 7.26e+06 parameters, 6.61683e+06 gradients
graph torch-jit-export (
%images[FLOAT, 1x3x416x416]
) initializers (
%483[INT64, 1]
%484[INT64, 1]
%485[INT64, 1]
%486[INT64, 1]
%487[INT64, 1]
%488[INT64, 1]
%model.0.conv.conv.bias[FLOAT, 32]
Any further ideas?
@edurenye ok. These are just batch-norm statistics. It makes sense then that pytorch matinains them as int64's to decrease the chance of them overflowing. I would say since there is no ONNX error (the export process runs a full suite of checks), then this is just between ONNX and opencv.
Well according to https://github.com/opencv/opencv/issues/14830#issuecomment-503279466 OpenCV tries to convert it to INT32 but it fails because it is out of the range.
Lot's of people like me will try to use it on the edge with OpenCV and OpenVINO because of how light your model is, makes it ideal for those use cases.
Can't I just remove those tensors somehow when exporting to ONNX?
@edurenye yes, I agree export should be easier, but this is the current state of affairs. The ONNX guys will say they are doing their job correctly, as will the opencv guys and the pytorch guys, and they are all technically correct since their responsibilities don't really extend past their packages, and all the packages are working correctly in their standalone capacities.
By the way, you can get a verbose export by setting the verbose flag: https://github.com/ultralytics/yolov5/blob/e74ccb2985ea747e1d4a2d92cad5f4f7738fb54f/models/export.py#L42-L48
@edurenye just realized that there are only 6 int64's in your onnx export, but there are many more batchnorm values. The 6 int64's must originate somewhere else.
Yes @glenn-jocher, because the code should turn them into int32, but they are still there.
I used the debug and I could find the origin of them, they are used in the 'Concat' operations, the following is the ending of the graph:
%468 : Tensor = onnx::Unsqueeze[axes=[0]](%459)
%471 : Tensor = onnx::Unsqueeze[axes=[0]](%462)
%472 : Tensor = onnx::Unsqueeze[axes=[0]](%465)
%473 : Tensor = onnx::Concat[axis=0](%468, %480, %481, %471, %472)
%474 : Float(1:89232, 3:29744, 11:2704, 52:52, 52:1) = onnx::Reshape(%384, %473) # /usr/src/app/models/yolo.py:26:0
%475 : Float(1:89232, 3:29744, 52:572, 52:11, 11:1) = onnx::Transpose[perm=[0, 1, 3, 4, 2]](%474) # /usr/src/app/models/yolo.py:26:0
return (%output, %456, %475)
The 'Concat' has 5 inputs, 3 are the outputs from the 'Unsqueeze' and the other 2 are this INT64, there are 3 of this blocks of layers, so it makes the 6 parameters INT64
Here is a image using Netron:
I guess this is part of the 'Detect' block, right?
So basically, I was not getting the INT64, because is the transformation of the model to ONNX who generates them, if I understand this correctly. So I should somehow turn the INT64 to INT32 after exporting the model to ONNX. But I don't really know how to do that...
BTW the 2 INT64 values are the same in the 3 cases, their values are 3 and 11:
@edurenye I pushed a few updates to export.py today for improved introspection and incremented to opset 12. The initial int64's no longer appear in the results, though I believe the batchnorm int64's remain. You might want to git pull
and see where the updates put you.
Thanks @glenn-jocher, but same result. I'll try a diferent approach I'll try to use something diferent to OpenCV, but I don't really know how to use OpenVINO in anything else, that is why I wanted to use the model with OpenCV.
Thanks for your help, I think the problem is more in the plate of the OpenCV guys.
@edurenye oh that's too bad. My output looks like this now, the original list of 6 int64's no longer appear:
cd yolov5
export PYTHONPATH="$PWD" # add path
python models/export.py --weights yolov5s.pt --img 640 --batch 1 # export
Output is:
Namespace(batch_size=1, img_size=[640, 640], weights='yolov5s.pt')
TorchScript export success, saved as yolov5s.torchscript # <-------- TorchScript exports first
Fusing layers...
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients
graph torch-jit-export (
%images[FLOAT, 1x3x640x640]
) initializers (
%model.0.conv.conv.bias[FLOAT, 32]
%model.0.conv.conv.weight[FLOAT, 32x12x3x3]
%model.1.conv.bias[FLOAT, 64]
%model.1.conv.weight[FLOAT, 64x32x3x3]
%model.10.conv.bias[FLOAT, 256]
%model.10.conv.weight[FLOAT, 256x512x1x1]
...
%650 = Gather[axis = 0](%648, %649)
%653 = Unsqueeze[axes = [0]](%644)
%656 = Unsqueeze[axes = [0]](%647)
%657 = Unsqueeze[axes = [0]](%650)
%658 = Concat[axis = 0](%653, %665, %666, %656, %657)
%659 = Reshape(%569, %658)
%660 = Transpose[perm = [0, 1, 3, 4, 2]](%659)
return %output, %641, %660
}
ONNX export success, saved as yolov5s.onnx # <-------- ONNX exports second
View with https://github.com/lutzroeder/netron
Strange, I tried again and I still have them :thinking:
What types have now in your exported model the Tensors %665
and %666
?
I'm using docker, but inside I'm using my local code as a volume (so I can have the last version from master). Might be something to do with the container, the versions of something? I'm building the Dockerfile from this repo, I just uncommented this line:
#RUN pip install -r requirements.txt
Why is it commented in the Dockerfile?
I don't have much to add unfortunately, but I'm having the same issue. Running in Google Colab & same error while importing into openCV.
@TrInsanity are you seeing the same output as https://github.com/ultralytics/yolov5/issues/250#issuecomment-653102358 at least?
@TrInsanity are you seeing the same output as #250 (comment) at least?
Namespace(batch_size=1, img_size=[640, 640], weights='./weights/last_weights.pt')
TorchScript export success, saved as ./weights/last_weights.torchscript
Fusing layers...
Model Summary: 236 layers, 4.74077e+07 parameters, 4.48868e+07 gradients
graph torch-jit-export (
%images[FLOAT, 1x3x640x640]
) initializers (
%model.0.conv.conv.bias[FLOAT, 64]
%model.0.conv.conv.weight[FLOAT, 64x12x3x3]
%model.1.conv.bias[FLOAT, 128]
%model.1.conv.weight[FLOAT, 128x64x3x3]
%model.10.conv.bias[FLOAT, 512]
%model.10.conv.weight[FLOAT, 512x1024x1x1]
...
%output = Transpose[perm = [0, 1, 3, 4, 2]](%649)
%651 = Shape(%606)
%652 = Constant[value = <Scalar Tensor []>]()
%653 = Gather[axis = 0](%651, %652)
%654 = Shape(%606)
%655 = Constant[value = <Scalar Tensor []>]()
%656 = Gather[axis = 0](%654, %655)
%657 = Shape(%606)
%658 = Constant[value = <Scalar Tensor []>]()
%659 = Gather[axis = 0](%657, %658)
%660 = Constant[value = <Scalar Tensor []>]()
%661 = Constant[value = <Scalar Tensor []>]()
%662 = Unsqueeze[axes = [0]](%653)
%663 = Unsqueeze[axes = [0]](%660)
%664 = Unsqueeze[axes = [0]](%661)
%665 = Unsqueeze[axes = [0]](%656)
%666 = Unsqueeze[axes = [0]](%659)
%667 = Concat[axis = 0](%662, %663, %664, %665, %666)
%668 = Reshape(%606, %667)
%669 = Transpose[perm = [0, 1, 3, 4, 2]](%668)
%670 = Shape(%581)
%671 = Constant[value = <Scalar Tensor []>]()
%672 = Gather[axis = 0](%670, %671)
%673 = Shape(%581)
%674 = Constant[value = <Scalar Tensor []>]()
%675 = Gather[axis = 0](%673, %674)
%676 = Shape(%581)
%677 = Constant[value = <Scalar Tensor []>]()
%678 = Gather[axis = 0](%676, %677)
%679 = Constant[value = <Scalar Tensor []>]()
%680 = Constant[value = <Scalar Tensor []>]()
%681 = Unsqueeze[axes = [0]](%672)
%682 = Unsqueeze[axes = [0]](%679)
%683 = Unsqueeze[axes = [0]](%680)
%684 = Unsqueeze[axes = [0]](%675)
%685 = Unsqueeze[axes = [0]](%678)
%686 = Concat[axis = 0](%681, %682, %683, %684, %685)
%687 = Reshape(%581, %686)
%688 = Transpose[perm = [0, 1, 3, 4, 2]](%687)
return %output, %669, %688
}
ONNX export success, saved as ./weights/last_weights.onnx
View with https://github.com/lutzroeder/netron
This is the output from the export command. Sorry, I'm not sure what I'm looking for but it looks similar to https://github.com/ultralytics/yolov5/issues/250#issuecomment-653102358
Edit: There's a few examples of lines with INT64 - all num_batches_tracked:
%model.9.bn.num_batches_tracked[INT64, scalar]
@TrInsanity looks good, you are seeing the same thing then. The values you see there are batchnorm statistics which pytorch tracks in int64's to reduce the risk of overflow from very high numbers. @edurenye had a loop above to change these to int32 that may or may not address your problem.
After using export.py to get .onnx, I use model=cv2.dnn.readNetFromONNX(.onnx) error happens: cv2.error: OpenCV(4.2.0) /io/opencv/modules/dnn/src/onnx/onnx_importer.cpp:134: error: (-215:Assertion failed) !field.empty() in function 'getMatFromTensor'
I'm having the same issues. @edurenye what did you have as the rsetattr
function?
Hi @THINK989, that error looks completely unrelated to the one in this issue, as the INT64 error happens in the 'Concat' layer.
Hello @edurenye
Yes Sorry about that, I got confirmation from OpenVino that the model is unsupported. I have deleted the comment.
Any updates on this issue or how to resolve them? I'm having the exact same problem.
I will be great if this is resolved. this is the only good option to run Yolov5 on the edge.
@edurenye Did you manage to solve this issue?
No, I just avoided exporting in my project and had to use PyTorch on the edge with was not nice, but is what I had to do.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
dnnnet = cv2.dnn.readNet(weights) #dnnnet = cv2.dnn.readNetFromONNX(weights) is same cv2.error: OpenCV(4.3.0) /io/opencv/modules/dnn/src/onnx/onnx_graph_simplifier.hpp:32: error: (-211:One of the arguments' values is out of range) Input is out of OpenCV 32S range in function 'convertInt64ToInt32'
models/export.py TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if self.grid[i].shape[2:4] != x[i].shape[2:4]: graph torch-jit-export ( %images[FLOAT, 1x3x448x640] ) initializers ( %1406[FLOAT, 4] %1407[FLOAT, 4] %1408[INT64, 1] %1409[INT64, 1] %1410[INT64, 1] %1411[FLOAT, 1x1x3x1x1x2] %1412[INT64, 1] %1413[INT64, 1] %1414[INT64, 1] %1415[INT64, 1] %1416[INT64, 1] %1417[INT64, 1] %1418[FLOAT, 1x1x3x1x1x2] %1419[INT64, 1] %1420[INT64, 1] %1421[INT64, 1] %1422[INT64, 1] %1423[INT64, 1] %1424[INT64, 1] %1425[FLOAT, 1x1x3x1x1x2] %1426[INT64, 1] %1427[INT64, 1] %1428[INT64, 1]
This issue should be re-opened. I've been looking around a lot, but other than using something there seems to be no solution to the problem. Has any participant of this discussion made any progression on the matter? I'd be very grateful if they could share if they found a solution.
I saw this article here: https://www.programmersought.com/article/63516964609/
I haven't tried it myself but maybe it could be of some help.
can you say me please what is the solution to this problem?
No, I just avoided exporting in my project and had to use PyTorch on the edge with was not nice, but is what I had to do.
can you say me please what is the solution to this problem?
@edurenye @mohammad69h94 @SFE92 @ubicgothg @coordxyz @wardeha @THINK989 @johs @123liluky good news π! Your original issue may now be fixed β in PR #4833 by @SamFC10. This PR implements architecture updates to allow for ONNX-exported YOLOv5 models to be used with OpenCV DNN.
To receive this update:
git pull
from within your yolov5/
directory or git clone https://github.com/ultralytics/yolov5
againmodel = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
sudo docker pull ultralytics/yolov5:latest
to update your image Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 π!
@edurenye When I use qt creator c++to import the classic yolov5s modelοΌAn error named "One of arguments' values is out of range (Input is out of OpenCV 32S range) in convertInt64ToInt32" occurred,After I simplified it with onnx sim, it still has no effect. Do you know how to solve it?thank you
No idea, sorry. I haven't used this project again since 2020. Maybe someone else can help you.
Hi, I found the (or at least one of the problems), because the Reshape at the end of the network seems to be to int64. This will likely cause the same problem, in spirit, of the batch normalization. @glenn-jocher is there a way to correct this?
@christiancumini Thanks for bringing this to our attention. @SamFC10 recently implemented updates in PR #4833 to address potential issues with ONNX-exported YOLOv5 models and OpenCV DNN, including fixes for data type compatibility. Please check if this resolves the problem you've encountered. In case you encounter further issues, feel free to create a new GitHub issue with any specific details attached. Thank you for your support and contributions to the YOLOv5 community!
I havent tried it myself yet but for future readers, this might be a more general onnx int64 -> int32 conversion approach: https://github.com/aadhithya/onnx-typecast
@Anner-deJong thank you for sharing this resource with the community. While I can't endorse external tools directly, it's always valuable when community members contribute potential solutions. If anyone tries this approach, please ensure to test the converted model thoroughly to confirm its functionality and accuracy. If there are any issues or feature requests related to YOLOv5, don't hesitate to reach out through the appropriate GitHub channels. Your feedback helps us improve! π
π Feature
When exporting and ONNX model add the option to use INT32 matrix instead of INT64 as INT64 is not supported by OpenCV
Motivation
I exported the model as an ONNX model, the I tried to import it in OpenCV 4.2.0 using
cv2.dnn.readNetFromONNX(model_path)
and I got the following error:Pitch
I want to have a parameter when exporting to be able to select INT32 matrix.
Alternatives
An alternative solution could be to just use INT32 always instead of INT64.
Additional context
https://github.com/opencv/opencv/issues/14830