Closed deepsworld closed 2 years ago
@deepsworld yes I'm able to reproduce, I get the same error message. Strangely enough 'argument' is misspelled in the error message.
ONNX: export failure: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument index in method wrapper_index_select)
I remember seeing a similar issues, but I believe these were resolved by PR https://github.com/ultralytics/yolov5/pull/5110
Hi,
I added model.cuda()
before the torch.model.export
which allowed the export to happen at half
precision.
@visualcortex-team can you please submit a PR with this fix to help future users? Thank you!
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
Hi @deepsworld @visualcortex-team @glenn-jocher , has the fix been merged? I've just faced exactly the same error on the master branch (commit id a45e472358d5051a6cb857483b8fb357b2634db2)
The frameworks I'm using:
screenshot:
I've already added model.cuda()
before invoking torch.onnx.export
, but it didn't work.
@knwng The workaround is to export on cpu device wiithout --device 0
@knwng The workaround is to export on cpu device wiithout --device 0
@deepsworld hi,what do u mean by it. I meet the same error when fix with --dynamic
@data-ant I meant export the model on cpu
instead of gpu
我已收到,谢谢。 祝您诸事顺利!
@data-ant I meant export the model on
cpu
instead ofgpu
@deepsworld But when to use --half
It can not work:
assert not (device.type == 'cpu' and half), '--half only compatible with GPU export, i.e. use --device 0'
我已收到,谢谢。 祝您诸事顺利!
@glenn-jocher Still error when execute cmd like:
python3 export.py --weights models/yolov5s.pt --include onnx --inplace --dynamic --device 0 --half
Error message:
<omitting python frames>
frame #51: __libc_start_main + 0xe7 (0x7f87b2c87c87 in /lib/x86_64-linux-gnu/libc.so.6)
(function ComputeConstantFolding)
ONNX: export failure: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
@MrRace not all combinations of arguments are compatible with each other. In your case it looks like you can use --dynamic
or --half
but not both simultaneously when exporting ONNX models.
@MrRace not all combinations of arguments are compatible with each other. In your case it looks like you can use
--dynamic
or--half
but not both simultaneously when exporting ONNX models.
@glenn-jocher If I want to export a tensorrt model which is dynamic in batch size and model precision is float 16, how should I do ? Thanks a lot!
@MrRace the YOLOv5 TensorRT exports are all FP16 by default, no matter what the input ONNX model is, but do not utilize the --dynamic argument. You can try to pass --dynamic to the TRT ONNX models, but we have not tested this so I'm not sure what the result will be: https://github.com/ultralytics/yolov5/blob/6ea81bb3a9bb1701bc0aa9ccca546368ce1fa400/export.py#L222-L229
@MrRace Well I've just figured that out. You should firstly export an ONNX model with dynamic shapes on FP32 and CPU. Then you can convert this ONNX model to TensorRT with dynamic shapes(you need to set an optimization profile, have a look at here https://github.com/knwng/yolov5/blob/672e53b58b4e0e871961a54480d1a74e9ed72c27/export.py#L264) on FP16 and GPU.
@MrRace Well I've just figured that out. You should firstly export an ONNX model with dynamic shapes on FP32 and CPU. Then you can convert this ONNX model to TensorRT with dynamic shapes(you need to set an optimization profile, have a look at here https://github.com/knwng/yolov5/blob/672e53b58b4e0e871961a54480d1a74e9ed72c27/export.py#L264) on FP16 and GPU. @knwng Thanks for your reply! How to get
optimization_profile
? Could you provide an example ofoptimization_profile
?
@MrRace Sure. It's also in my repo: https://github.com/knwng/yolov5/blob/master/trt_opt_profile.yaml
Basically, you should tell TRT's optimizer the minimal/optimized/maximal input shapes you want. You can also refer to some official docs like https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#opt_profiles and https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/OptimizationProfile.html
@MrRace Well I've just figured that out. You should firstly export an ONNX model with dynamic shapes on FP32 and CPU. Then you can convert this ONNX model to TensorRT with dynamic shapes(you need to set an optimization profile, have a look at here https://github.com/knwng/yolov5/blob/672e53b58b4e0e871961a54480d1a74e9ed72c27/export.py#L264) on FP16 and GPU. @knwng Thanks for your reply! How to get
optimization_profile
? Could you provide an example ofoptimization_profile
?
@knwng Thanks a lot! As you say, I should export an ONNX model with dynamic shapes on FP32 and CPU.Therefore I export my pt file to onnx, cmd like:
python3 export.py --weights /home/model.pt --include onnx --dynamic --device cpu
When convert the ONNX file to tensorrt, comes error:
[04/21/2022-14:46:24] [TRT] [E] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:3608 In function importResize:
[8] Assertion failed: scales.is_weights() && "Resize scales must be an initializer!"
My optimization_profile is:
- name: 'images'
shapes:
min:
- 1
- 3
- 640
- 640
opt:
- 64
- 3
- 640
- 640
max:
- 128
- 3
- 640
- 640
@knwng Your export.py
seems not support input specified onnx file, and I convert the raw pt to onnx-dynamic-fp32 , than comment the export_onnx when do export_engine
我已收到,谢谢。 祝您诸事顺利!
我已收到,谢谢。 祝您诸事顺利!
@data-ant 您好!感谢您的信息。如果您有任何其他问题,都可以随时向我提问。祝您一切顺利!🌟
Search before asking
YOLOv5 Component
Export
Bug
Export fails with
--dynamic
and--device 0
with below logs. The export works fine without--dynamic
or with--device cpu
. The graphs when visualized withnetron.app
looks widely different for theDetect()
layer.Environment
YOLOv5:v6.0 OS: Ubuntu 16.04 Python:3.9 Pytorch:1.9
Minimal Reproducible Example
python export.py --weights yolov5x.pt --img 640 --batch 1 --device 0 --dynamic
Additional
This could be a bug with pytorch onnx export itself but wanted to verify here before posting it on pytorch repo. Its very similar to https://github.com/pytorch/pytorch/issues/62712
Are you willing to submit a PR?