Jetson Orin Nano JetPack 5.1.3 install latest yolov5, RuntimeError: Couldn't load custom C++ ops

lida2003 commented 3 weeks ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

I followed readme and all seems fine:

When I run with object detection command, I got "RuntimeError: Couldn't load custom C++ ops".

I don't know why. Further discussion here with NVIDIA.

Pytorch & torchversion compatible issue on L4T35.5.0

daniel@daniel-nvidia:~/Work/yolov5$ python detect.py --weights yolov5s.pt --source ../../Videos/Worlds_longest_drone_fpv_one_shot.mp4
WARNING ⚠️ Python>=3.10 is required, but Python==3.8.10 is currently installed
/home/daniel/.local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/daniel/.local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSsb'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
detect: weights=['yolov5s.pt'], source=../../Videos/Worlds_longest_drone_fpv_one_shot.mp4, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_format=0, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 🚀 v7.0-378-g2f74455a Python-3.8.10 torch-2.1.0a0+41361538.nv23.06 CUDA:0 (Orin, 7451MiB)

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Traceback (most recent call last):
  File "detect.py", line 437, in <module>
    main(opt)
  File "detect.py", line 432, in main
    run(**vars(opt))
  File "/home/daniel/.local/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "detect.py", line 210, in run
    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
  File "/home/daniel/Work/yolov5/utils/general.py", line 1104, in non_max_suppression
    i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
  File "/home/daniel/.local/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 40, in nms
    _assert_has_ops()
  File "/home/daniel/.local/lib/python3.8/site-packages/torchvision/extension.py", line 46, in _assert_has_ops
    raise RuntimeError(
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

Additional

No response

UltralyticsAssistant commented 3 weeks ago

👋 Hello @lida2003, thank you for your interest in YOLOv5 🚀! Please refer to our tutorials to get started, where you can find quickstart guides for simple tasks like custom data training and advanced concepts like hyperparameter evolution.

Since this is a 🐛 Bug Report, could you please provide a minimum reproducible example to help us debug the issue? If this is related to pytorch or torchvision compatibility, please ensure that your versions are compatible as this might be the source of the issue.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our tips for best training results.

Requirements

Make sure you have Python>=3.8.0 with all required packages including PyTorch>=1.8 installed. To get started, consider cloning the YOLOv5 repository and installing the required packages.

Environments

YOLOv5 can be run in various environments such as online notebooks with free GPUs, Google Cloud Deep Learning VM, Amazon Deep Learning AMI, and Docker Image. Choose the environment that best fits your setup.

Status

If the YOLOv5 Continuous Integration tests are passing, it ensures the repository is working correctly across different systems. CI tests verify correct operation of YOLOv5 training, validation, inference, export, and benchmarks on macOS, Windows, and Ubuntu.

Stay tuned, as an Ultralytics engineer will assist you soon! Feel free to check our latest state-of-the-art model, YOLOv8, which promises improved performance and is easy to use. Happy coding! 😊

lida2003 commented 3 weeks ago

Below version has been tested, all failed with same log.

BTW, PyTorch is a binary release from NVIDIA, which is confirmed by NVIDIA in previous discussion link to nvidia forum.

PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.0
PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.1
PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.2

daniel@daniel-nvidia:~/Work/yolov5$ python --version
Python 3.8.10
daniel@daniel-nvidia:~/Work/yolov5$ python -c "import torch; import torchvision; print(f'PyTorch version: {torch.__version__}'); print(f'Torchvision version: {torchvision.__version__}')"

/home/daniel/.local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/daniel/.local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSsb'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
PyTorch version: 2.1.0a0+41361538.nv23.06
Torchvision version: 0.16.0

pderrenger commented 3 weeks ago

Please ensure that your PyTorch and torchvision versions are compatible. You might want to try reinstalling torchvision to match your PyTorch version. If the issue persists, consider testing with the latest stable releases of both packages.

lida2003 commented 3 weeks ago

Please ensure that your PyTorch and torchvision versions are compatible.

Following https://docs.ultralytics.com/guides/nvidia-jetson/#install-pytorch-and-torchvision_1 , it's compatible. But it can't work.

You might want to try reinstalling torchvision to match your PyTorch version.

Yes, I think I have test as much as I can do.

PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.0
PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.1
PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.2

If the issue persists, consider testing with the latest stable releases of both packages.

NVIDIA JetPack 5.1.3 python 3.8.10, and the latest version It suggests to use 3.10 version. So I think I would like to keep NVIDIA product JetPack 5.1.3 version for now.

The question is why do as guide suggest, it went wrong? Is there any extra software I have to install?

pderrenger commented 3 weeks ago

It seems like there might be an issue with the NVIDIA-specific PyTorch build. Ensure all dependencies like libjpeg and libpng are installed before building torchvision. You might also try using a virtual environment to isolate the setup. If the problem continues, consider reaching out to NVIDIA support for further assistance with their specific PyTorch release.

lida2003 commented 3 weeks ago

It seems like there might be an issue with the NVIDIA-specific PyTorch build.

Well, I did ask them about the binary build and did as they told me to install the binary first then build torchvison, detailed info see previous link: Pytorch & torchversion compatible issue on L4T35.5.0

Ensure all dependencies like libjpeg and libpng are installed before building torchvision.

Yes, it's all installed.

You might also try using a virtual environment to isolate the setup. If the problem continues, consider reaching out to NVIDIA support for further assistance with their specific PyTorch release.

virtual environment setup met issue when run those scripts. So I would foucus mainly on real setup and it takes less CPU/Memoray which is the real deployment of the software.

Can you let me know what "custom C++ ops" is?

lida2003 commented 3 weeks ago

@pderrenger

BTW should I install libpng++-dev? Please check below installation for libjpg(libjpeg-dev) and libpng(libpng-dev).

daniel@daniel-nvidia:~$ aptitude search libjpeg
v   libjpeg-dbg                                             -
i   libjpeg-dev                                             - Independent JPEG Group's JPEG runtime library (dependency package)
p   libjpeg-progs                                           - Programs for manipulating JPEG files
p   libjpeg-tools                                           - Complete implementation of 10918-1 (JPEG)
p   libjpeg-turbo-progs                                     - Programs for manipulating JPEG files
p   libjpeg-turbo-test                                      - Program for benchmarking and testing libjpeg-turbo
i A libjpeg-turbo8                                          - IJG JPEG compliant runtime library.
p   libjpeg-turbo8-dbg                                      - Debugging symbols for the libjpeg-turbo library
i   libjpeg-turbo8-dev                                      - Development files for the IJG JPEG library
p   libjpeg62                                               - Independent JPEG Group's JPEG runtime library (version 6.2)
p   libjpeg62-dev                                           - Development files for the IJG JPEG library (version 6.2)
i A libjpeg8                                                - Independent JPEG Group's JPEG runtime library (dependency package)
p   libjpeg8-dbg                                            - Independent JPEG Group's JPEG runtime library (dependency package)
i   libjpeg8-dev                                            - Independent JPEG Group's JPEG runtime library (dependency package)
p   libjpeg9                                                - Independent JPEG Group's JPEG runtime library
p   libjpeg9-dev                                            - Development files for the IJG JPEG library
daniel@daniel-nvidia:~$ aptitude search libpng
p   libpng++-dev                                            - C++ interface to the PNG (Portable Network Graphics) library
i   libpng-dev                                              - PNG library - development (version 1.6)
p   libpng-sixlegs-java                                     - Sixlegs Java PNG Decoder
p   libpng-sixlegs-java-doc                                 - Documentation for Sixlegs Java PNG Decoder
p   libpng-tools                                            - PNG library - tools (version 1.6)
i A libpng16-16                                             - PNG library - runtime (version 1.6)
p   libpnglite-dev                                          - lightweight C library for loading and writing PNG images
p   libpnglite0                                             - lightweight C library for loading and writing PNG images

pderrenger commented 3 weeks ago

You don't need to install libpng++-dev specifically. Having libjpeg-dev and libpng-dev should be sufficient for building torchvision. If issues persist, ensure all dependencies are correctly installed and consider testing with the latest package versions.

lida2003 commented 3 weeks ago

PyTorch 2.1.0a0+41361538.nv23.06 + torchvision-0.20.0a0+945bdad

$ pip uninstall torchvision
Found existing installation: torchvision 0.16.0
Uninstalling torchvision-0.16.0:
  Would remove:
    /home/daniel/.local/lib/python3.8/site-packages/torchvision-0.16.0.dist-info/*
    /home/daniel/.local/lib/python3.8/site-packages/torchvision/*
Proceed (Y/n)? Y
  Successfully uninstalled torchvision-0.16.0
daniel@daniel-nvidia:~/Work/torchvision$ pip install .
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /home/daniel/Work/torchvision
  Installing build dependencies ... -^[[C^[[C^[[C^[[done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy in /home/daniel/.local/lib/python3.8/site-packages (from torchvision==0.20.0a0+945bdad) (1.                                        23.5)
Requirement already satisfied: torch in /home/daniel/.local/lib/python3.8/site-packages (from torchvision==0.20.0a0+945bdad) (2.                                        1.0a0+41361538.nv23.6)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /home/daniel/.local/lib/python3.8/site-packages (from torchvision==0.20.                                        0a0+945bdad) (10.4.0)
Requirement already satisfied: filelock in /usr/lib/python3/dist-packages (from torch->torchvision==0.20.0a0+945bdad) (3.0.12)
Requirement already satisfied: fsspec in /home/daniel/.local/lib/python3.8/site-packages (from torch->torchvision==0.20.0a0+945b                                        dad) (2024.10.0)
Requirement already satisfied: jinja2 in /usr/lib/python3/dist-packages (from torch->torchvision==0.20.0a0+945bdad) (2.10.1)
Requirement already satisfied: networkx in /home/daniel/.local/lib/python3.8/site-packages (from torch->torchvision==0.20.0a0+94                                        5bdad) (3.1)
Requirement already satisfied: sympy in /home/daniel/.local/lib/python3.8/site-packages (from torch->torchvision==0.20.0a0+945bd                                        ad) (1.13.3)
Requirement already satisfied: typing-extensions in /home/daniel/.local/lib/python3.8/site-packages (from torch->torchvision==0.                                        20.0a0+945bdad) (4.12.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/daniel/.local/lib/python3.8/site-packages (from sympy->torch->torchvi                                        sion==0.20.0a0+945bdad) (1.3.0)
Building wheels for collected packages: torchvision
  Building wheel for torchvision (pyproject.toml) ... done
  Created wheel for torchvision: filename=torchvision-0.20.0a0+945bdad-cp38-cp38-linux_aarch64.whl size=1196790 sha256=294124bf5                                        972ee4c9085f43352ed80b94772979d82970f5319e3d0c403967313
  Stored in directory: /tmp/pip-ephem-wheel-cache-dbitv4r7/wheels/12/f2/fd/9a2cd59f45fe55f3ec87a661481722bd68e804b4c7a21bceca
Successfully built torchvision
Installing collected packages: torchvision
Successfully installed torchvision-0.20.0a0+945bdad

Can you let me know what "custom C++ ops" or "torch._custom_ops" is?

daniel@daniel-nvidia:~/Work$ yolo track model=yolov8n.engine source=../Videos/Worlds_longest_drone_fpv_one_shot.mp4
WARNING ⚠️ torchvision==0.20 is incompatible with torch==2.1.
Run 'pip install torchvision==0.16' to fix torchvision or 'pip install -U torch torchvision' to update both.
For a full compatibility table see https://github.com/pytorch/vision#installation
WARNING ⚠️ Python>=3.10 is required, but Python==3.8.10 is currently installed
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'.
Ultralytics 8.3.21 🚀 Python-3.8.10 torch-2.1.0a0+41361538.nv23.06 CUDA:0 (Orin, 7451MiB)
Loading yolov8n.engine for TensorRT inference...
[10/30/2024-20:36:49] [TRT] [I] Loaded engine size: 13 MiB
[10/30/2024-20:36:49] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[10/30/2024-20:36:50] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +616, GPU +583, now: CPU 1003, GPU 3992 (MiB)
[10/30/2024-20:36:50] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12, now: CPU 0, GPU 12 (MiB)
[10/30/2024-20:36:50] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 990, GPU 3981 (MiB)
[10/30/2024-20:36:50] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +18, now: CPU 0, GPU 30 (MiB)

Traceback (most recent call last):
  File "/home/daniel/.local/bin/yolo", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/daniel/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 824, in entrypoint
    getattr(model, mode)(**overrides)  # default args from model
  File "/home/daniel/.local/lib/python3.8/site-packages/ultralytics/engine/model.py", line 601, in track
    return self.predict(source=source, stream=stream, **kwargs)
  File "/home/daniel/.local/lib/python3.8/site-packages/ultralytics/engine/model.py", line 554, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
  File "/home/daniel/.local/lib/python3.8/site-packages/ultralytics/engine/predictor.py", line 183, in predict_cli
    for _ in gen:  # sourcery skip: remove-empty-nested-block, noqa
  File "/home/daniel/.local/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/home/daniel/.local/lib/python3.8/site-packages/ultralytics/engine/predictor.py", line 234, in stream_inference
    self.model.warmup(imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, 3, *self.imgsz))
  File "/home/daniel/.local/lib/python3.8/site-packages/ultralytics/nn/autobackend.py", line 642, in warmup
    import torchvision  # noqa (import here so torchvision import time not recorded in postprocess time)
  File "/home/daniel/.local/lib/python3.8/site-packages/torchvision/__init__.py", line 10, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
  File "/home/daniel/.local/lib/python3.8/site-packages/torchvision/_meta_registrations.py", line 4, in <module>
    import torch._custom_ops
ModuleNotFoundError: No module named 'torch._custom_ops'
daniel@daniel-nvidia:~/Work$ cd yolov5/
daniel@daniel-nvidia:~/Work/yolov5$ python detect.py --weights yolov5s.pt --source ../Videos/Worlds_longest_drone_fpv_one_shot.mp4
WARNING ⚠️ torchvision==0.20 is incompatible with torch==2.1.
Run 'pip install torchvision==0.16' to fix torchvision or 'pip install -U torch torchvision' to update both.
For a full compatibility table see https://github.com/pytorch/vision#installation
WARNING ⚠️ Python>=3.10 is required, but Python==3.8.10 is currently installed
Traceback (most recent call last):
  File "detect.py", line 48, in <module>
    from models.common import DetectMultiBackend
  File "/home/daniel/Work/yolov5/models/common.py", line 39, in <module>
    from utils.dataloaders import exif_transpose, letterbox
  File "/home/daniel/Work/yolov5/utils/dataloaders.py", line 23, in <module>
    import torchvision
  File "/home/daniel/.local/lib/python3.8/site-packages/torchvision/__init__.py", line 10, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
  File "/home/daniel/.local/lib/python3.8/site-packages/torchvision/_meta_registrations.py", line 4, in <module>
    import torch._custom_ops
ModuleNotFoundError: No module named 'torch._custom_ops'

pderrenger commented 3 weeks ago

The "custom C++ ops" or "torch._custom_ops" refers to custom operations implemented in C++ for PyTorch. These are typically used for performance optimization. The error indicates a missing module, likely due to version incompatibility. Please ensure you are using compatible versions of PyTorch and torchvision as per the compatibility matrix provided by PyTorch.

lida2003 commented 3 weeks ago

The error indicates a missing module, likely due to version incompatibility.

I didn't know which module is missing or version is incompatible. I didn't figured out why the proven good version from NVIDIA can't work. Any debug options can locate the missing module name, or version incompatible issue?

Please ensure you are using compatible versions of PyTorch and torchvision as per the compatibility matrix provided by PyTorch.

Now I have tried following version, any compatible issue?

PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.0 //googled
PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.1 // nvidia suggested
PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.16.2 // from nvidia guide
PyTorch 2.1.0a0+41361538.nv23.06 + Torchvision version: 0.15.0 //googled
PyTorch 2.1.0a0+41361538.nv23.06 + torchvision-0.20.0a0+945bdad // latest code

pderrenger commented 3 weeks ago

It seems you've tried several combinations. To debug further, check the compatibility matrix on the PyTorch GitHub page to ensure the versions align. If issues persist, consider testing with the latest stable releases of both PyTorch and torchvision.

lida2003 commented 3 weeks ago

@pderrenger

PyTorch is binary released by NVIDIA(CUDA support). It's NOT possible to build from source.
Do you have any contact with NVIDIA? As PyTorch 2.1.0a0+41361538.nv23.06 is the latest for JetPack 5.1.3.
And it's last product version supporting ROS middle ware.

pderrenger commented 3 weeks ago

Thank you for reaching out. Unfortunately, we don't have direct contact with NVIDIA. For issues related to their specific PyTorch builds, I recommend continuing discussions on the NVIDIA forums or contacting their support for assistance.

lida2003 commented 3 weeks ago

OK, Thanks for you time. Hope NVIDIA will support this production version to help me find out what's going on there.

pderrenger commented 3 weeks ago

You're welcome. I recommend continuing to engage with NVIDIA support for further assistance on this issue. If you have any other questions related to YOLOv5, feel free to ask.

ultralytics / yolov5