ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.76k stars 16.12k forks source link

Cannot export to onnx using dynamic and cuda device #5439

Closed deepsworld closed 2 years ago

deepsworld commented 2 years ago

Search before asking

YOLOv5 Component

Export

Bug

Export fails with --dynamic and --device 0 with below logs. The export works fine without --dynamic or with --device cpu. The graphs when visualized with netron.app looks widely different for the Detect() layer.

export: data=data/coco128.yaml, weights=yolov5x.pt, imgsz=[640], batch_size=1, device=0, half=False, inplace=False, train=False, optimize=False, int8=False, dynamic=True, simplify=False, opset=13, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['torchscript', 'onnx']
YOLOv5 🚀 v6.0-0-g956be8e torch 1.9.0 CUDA:0 (NVIDIA TITAN X (Pascal), 12192.9375MB)

Fusing layers... 
Model Summary: 444 layers, 86705005 parameters, 0 gradients

PyTorch: starting from yolov5x.pt (174.0 MB)

TorchScript: starting export with torch 1.9.0...
/home/ml/dpatel/Downloads/yolov5/models/yolo.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
/home/ml/dpatel/Downloads/yolov5/models/yolo.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
TorchScript: export success, saved as yolov5x.torchscript.pt (347.4 MB)
/home/ml/dpatel/Downloads/yolov5/models/yolo.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
[W shape_type_inference.cpp:419] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument index in method wrapper_index_select)
Exception raised from common_device_check_failure at /opt/conda/conda-bld/pytorch_1623448255797/work/aten/src/ATen/core/adaption.cpp:10 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f5bc9665a22 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f5bc96623db in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: c10::impl::common_device_check_failure(c10::optional<c10::Device>&, at::Tensor const&, char const*, char const*) + 0x37e (0x7f5bca736a0e in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x9a2aab (0x7f5b782cdaab in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #4: <unknown function> + 0x9a2b32 (0x7f5b782cdb32 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #5: at::redispatch::index_select(c10::DispatchKeySet, at::Tensor const&, long, at::Tensor const&) + 0xb4 (0x7f5bcb0a92b4 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x2d57741 (0x7f5bcc836741 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x2d57b95 (0x7f5bcc836b95 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::index_select(at::Tensor const&, long, at::Tensor const&) + 0x14e (0x7f5bcaec80ae in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #9: torch::jit::onnx_constant_fold::runTorchBackendForOnnx(torch::jit::Node const*, std::vector<at::Tensor, std::allocator<at::Tensor> >&, int) + 0x1b50 (0x7f5c42fc6ea0 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0xae9f4e (0x7f5c43003f4e in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #11: torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map<std::string, c10::IValue, std::less<std::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&, int) + 0x906 (0x7f5c43008d06 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0xaf19b4 (0x7f5c4300b9b4 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #13: <unknown function> + 0xa6e4a0 (0x7f5c42f884a0 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #14: <unknown function> + 0x4fe1db (0x7f5c42a181db in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #56: __libc_start_main + 0xf0 (0x7f5c75283840 in /lib/x86_64-linux-gnu/libc.so.6)
 (function ComputeConstantFolding)
[W shape_type_inference.cpp:419] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument index in method wrapper_index_select)
Exception raised from common_device_check_failure at /opt/conda/conda-bld/pytorch_1623448255797/work/aten/src/ATen/core/adaption.cpp:10 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f5bc9665a22 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f5bc96623db in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: c10::impl::common_device_check_failure(c10::optional<c10::Device>&, at::Tensor const&, char const*, char const*) + 0x37e (0x7f5bca736a0e in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x9a2aab (0x7f5b782cdaab in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #4: <unknown function> + 0x9a2b32 (0x7f5b782cdb32 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #5: at::redispatch::index_select(c10::DispatchKeySet, at::Tensor const&, long, at::Tensor const&) + 0xb4 (0x7f5bcb0a92b4 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x2d57741 (0x7f5bcc836741 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x2d57b95 (0x7f5bcc836b95 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::index_select(at::Tensor const&, long, at::Tensor const&) + 0x14e (0x7f5bcaec80ae in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #9: torch::jit::onnx_constant_fold::runTorchBackendForOnnx(torch::jit::Node const*, std::vector<at::Tensor, std::allocator<at::Tensor> >&, int) + 0x1b50 (0x7f5c42fc6ea0 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0xae9f4e (0x7f5c43003f4e in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #11: torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map<std::string, c10::IValue, std::less<std::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&, int) + 0x906 (0x7f5c43008d06 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0xaf19b4 (0x7f5c4300b9b4 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #13: <unknown function> + 0xa6e4a0 (0x7f5c42f884a0 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #14: <unknown function> + 0x4fe1db (0x7f5c42a181db in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #56: __libc_start_main + 0xf0 (0x7f5c75283840 in /lib/x86_64-linux-gnu/libc.so.6)
 (function ComputeConstantFolding)
[W shape_type_inference.cpp:419] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument index in method wrapper_index_select)
Exception raised from common_device_check_failure at /opt/conda/conda-bld/pytorch_1623448255797/work/aten/src/ATen/core/adaption.cpp:10 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f5bc9665a22 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f5bc96623db in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: c10::impl::common_device_check_failure(c10::optional<c10::Device>&, at::Tensor const&, char const*, char const*) + 0x37e (0x7f5bca736a0e in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x9a2aab (0x7f5b782cdaab in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #4: <unknown function> + 0x9a2b32 (0x7f5b782cdb32 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #5: at::redispatch::index_select(c10::DispatchKeySet, at::Tensor const&, long, at::Tensor const&) + 0xb4 (0x7f5bcb0a92b4 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x2d57741 (0x7f5bcc836741 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x2d57b95 (0x7f5bcc836b95 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::index_select(at::Tensor const&, long, at::Tensor const&) + 0x14e (0x7f5bcaec80ae in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #9: torch::jit::onnx_constant_fold::runTorchBackendForOnnx(torch::jit::Node const*, std::vector<at::Tensor, std::allocator<at::Tensor> >&, int) + 0x1b50 (0x7f5c42fc6ea0 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0xae9f4e (0x7f5c43003f4e in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #11: torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map<std::string, c10::IValue, std::less<std::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&, int) + 0x906 (0x7f5c43008d06 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0xaf19b4 (0x7f5c4300b9b4 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #13: <unknown function> + 0xa6e4a0 (0x7f5c42f884a0 in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #14: <unknown function> + 0x4fe1db (0x7f5c42a181db in /home/ml/dpatel/miniconda3/envs/sinet39/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #56: __libc_start_main + 0xf0 (0x7f5c75283840 in /lib/x86_64-linux-gnu/libc.so.6)
 (function ComputeConstantFolding)
ONNX: export failure: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument index in method wrapper_index_select)

Export complete (12.46s)
Results saved to /home/ml/dpatel/Downloads/yolov5
Visualize with https://netron.app

Environment

YOLOv5:v6.0 OS: Ubuntu 16.04 Python:3.9 Pytorch:1.9

Minimal Reproducible Example

python export.py --weights yolov5x.pt --img 640 --batch 1 --device 0 --dynamic

Additional

This could be a bug with pytorch onnx export itself but wanted to verify here before posting it on pytorch repo. Its very similar to https://github.com/pytorch/pytorch/issues/62712

Are you willing to submit a PR?

glenn-jocher commented 2 years ago

@deepsworld yes I'm able to reproduce, I get the same error message. Strangely enough 'argument' is misspelled in the error message.

ONNX: export failure: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument index in method wrapper_index_select)

I remember seeing a similar issues, but I believe these were resolved by PR https://github.com/ultralytics/yolov5/pull/5110

visualcortex-team commented 2 years ago

Hi, I added model.cuda() before the torch.model.export which allowed the export to happen at half precision.

glenn-jocher commented 2 years ago

@visualcortex-team can you please submit a PR with this fix to help future users? Thank you!

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

knwng commented 2 years ago

Hi @deepsworld @visualcortex-team @glenn-jocher , has the fix been merged? I've just faced exactly the same error on the master branch (commit id a45e472358d5051a6cb857483b8fb357b2634db2)

The frameworks I'm using:

screenshot: Screen Shot 2022-02-14 at 17 06 46

I've already added model.cuda() before invoking torch.onnx.export, but it didn't work.

deepsworld commented 2 years ago

@knwng The workaround is to export on cpu device wiithout --device 0

data-ant commented 2 years ago

@knwng The workaround is to export on cpu device wiithout --device 0

@deepsworld hi,what do u mean by it. I meet the same error when fix with --dynamic

deepsworld commented 2 years ago

@data-ant I meant export the model on cpu instead of gpu

data-ant commented 2 years ago

我已收到,谢谢。                       祝您诸事顺利!

MrRace commented 2 years ago

@data-ant I meant export the model on cpu instead of gpu

@deepsworld But when to use --half It can not work:

 assert not (device.type == 'cpu' and half), '--half only compatible with GPU export, i.e. use --device 0'
data-ant commented 2 years ago

我已收到,谢谢。                       祝您诸事顺利!

MrRace commented 2 years ago

@glenn-jocher Still error when execute cmd like:

python3 export.py --weights models/yolov5s.pt --include onnx --inplace --dynamic --device 0 --half

Error message:

<omitting python frames>
frame #51: __libc_start_main + 0xe7 (0x7f87b2c87c87 in /lib/x86_64-linux-gnu/libc.so.6)
 (function ComputeConstantFolding)
ONNX: export failure: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
glenn-jocher commented 2 years ago

@MrRace not all combinations of arguments are compatible with each other. In your case it looks like you can use --dynamic or --half but not both simultaneously when exporting ONNX models.

MrRace commented 2 years ago

@MrRace not all combinations of arguments are compatible with each other. In your case it looks like you can use --dynamic or --half but not both simultaneously when exporting ONNX models.

@glenn-jocher If I want to export a tensorrt model which is dynamic in batch size and model precision is float 16, how should I do ? Thanks a lot!

glenn-jocher commented 2 years ago

@MrRace the YOLOv5 TensorRT exports are all FP16 by default, no matter what the input ONNX model is, but do not utilize the --dynamic argument. You can try to pass --dynamic to the TRT ONNX models, but we have not tested this so I'm not sure what the result will be: https://github.com/ultralytics/yolov5/blob/6ea81bb3a9bb1701bc0aa9ccca546368ce1fa400/export.py#L222-L229

knwng commented 2 years ago

@MrRace Well I've just figured that out. You should firstly export an ONNX model with dynamic shapes on FP32 and CPU. Then you can convert this ONNX model to TensorRT with dynamic shapes(you need to set an optimization profile, have a look at here https://github.com/knwng/yolov5/blob/672e53b58b4e0e871961a54480d1a74e9ed72c27/export.py#L264) on FP16 and GPU.

MrRace commented 2 years ago

@MrRace Well I've just figured that out. You should firstly export an ONNX model with dynamic shapes on FP32 and CPU. Then you can convert this ONNX model to TensorRT with dynamic shapes(you need to set an optimization profile, have a look at here https://github.com/knwng/yolov5/blob/672e53b58b4e0e871961a54480d1a74e9ed72c27/export.py#L264) on FP16 and GPU. @knwng Thanks for your reply! How to get optimization_profile? Could you provide an example of optimization_profile

knwng commented 2 years ago

@MrRace Sure. It's also in my repo: https://github.com/knwng/yolov5/blob/master/trt_opt_profile.yaml

Basically, you should tell TRT's optimizer the minimal/optimized/maximal input shapes you want. You can also refer to some official docs like https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#opt_profiles and https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/OptimizationProfile.html

MrRace commented 2 years ago

@MrRace Well I've just figured that out. You should firstly export an ONNX model with dynamic shapes on FP32 and CPU. Then you can convert this ONNX model to TensorRT with dynamic shapes(you need to set an optimization profile, have a look at here https://github.com/knwng/yolov5/blob/672e53b58b4e0e871961a54480d1a74e9ed72c27/export.py#L264) on FP16 and GPU. @knwng Thanks for your reply! How to get optimization_profile? Could you provide an example of optimization_profile

@knwng Thanks a lot! As you say, I should export an ONNX model with dynamic shapes on FP32 and CPU.Therefore I export my pt file to onnx, cmd like:

python3 export.py --weights /home/model.pt --include onnx --dynamic --device cpu

When convert the ONNX file to tensorrt, comes error:

[04/21/2022-14:46:24] [TRT] [E] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:3608 In function importResize:
[8] Assertion failed: scales.is_weights() && "Resize scales must be an initializer!"

My optimization_profile is:

- name: 'images'
  shapes:
    min:
      - 1
      - 3
      - 640
      - 640
    opt:
      - 64
      - 3
      - 640
      - 640
    max:
      - 128
      - 3
      - 640
      - 640
MrRace commented 2 years ago

@knwng Your export.py seems not support input specified onnx file, and I convert the raw pt to onnx-dynamic-fp32 , than comment the export_onnx when do export_engine

data-ant commented 1 year ago

我已收到,谢谢。                       祝您诸事顺利!

data-ant commented 1 year ago

我已收到,谢谢。                       祝您诸事顺利!

glenn-jocher commented 10 months ago

@data-ant 您好!感谢您的信息。如果您有任何其他问题,都可以随时向我提问。祝您一切顺利!🌟