ultralytics / ultralytics

NEW - YOLOv8 πŸš€ in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
26.26k stars 5.23k forks source link

NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. #5059

Closed chri002 closed 9 months ago

chri002 commented 9 months ago

Search before asking

Bug

I have tried the old answers but nothing works with torch 2.* in windows, on linux there is no problem; on windows i have tried installing 2.0.0 2.0.1 and 2.0.2 with associated torchvision and torchaudio for cuda 11.8, changing ultralytics version but nothing.

update: If I run the code from the terminal python script it works, but within jupyter it does not work and returns the error.

the code is a easy inference:

  from PIL import Image
  from ultralytics import YOLO

  reconizer = YOLO("yolov8n.pt")

  image = Image.open("NewPath.png")
  output = None
  output = reconizer(image)
  out = []
  for r in output:
    boxes = r.boxes
    for box in boxes:
        if(box.conf>0.5):
          b = box.xyxy[0]
          c = box.cls
          out.append([b, reconizer.names[int(c)]])

  print(out)

the error is as follows:


NotImplementedError Traceback (most recent call last) Cell In[3], line 16 14 image = Image.open("NewPath.png") 15 output = None ---> 16 output = reconizer(image) 17 out = [] 18 for r in output:

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\model.py:96, in Model.call(self, source, stream, kwargs) 94 def call(self, source=None, stream=False, kwargs): 95 """Calls the 'predict' function with given arguments to perform object detection.""" ---> 96 return self.predict(source, stream, **kwargs)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\model.py:236, in Model.predict(self, source, stream, predictor, **kwargs) 234 if prompts and hasattr(self.predictor, 'set_prompts'): # for SAM-type models 235 self.predictor.set_prompts(prompts) --> 236 return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\predictor.py:194, in BasePredictor.call(self, source, model, stream, *args, kwargs) 192 return self.stream_inference(source, model, *args, *kwargs) 193 else: --> 194 return list(self.stream_inference(source, model, args, kwargs))

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils_contextlib.py:35, in _wrap_generator..generator_context(*args, **kwargs) 32 try: 33 # Issuing None to a generator fires it up 34 with ctx_factory(): ---> 35 response = gen.send(None) 37 while True: 38 try: 39 # Forward the response to our caller and get its next request

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\predictor.py:257, in BasePredictor.stream_inference(self, source, model, *args, **kwargs) 255 # Postprocess 256 with profilers[2]: --> 257 self.results = self.postprocess(preds, im, im0s) 258 self.run_callbacks('on_predict_postprocess_end') 260 # Visualize, save, write results

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\models\yolo\detect\predict.py:25, in DetectionPredictor.postprocess(self, preds, img, orig_imgs) 23 def postprocess(self, preds, img, orig_imgs): 24 """Post-processes predictions and returns a list of Results objects.""" ---> 25 preds = ops.non_max_suppression(preds, 26 self.args.conf, 27 self.args.iou, 28 agnostic=self.args.agnostic_nms, 29 max_det=self.args.max_det, 30 classes=self.args.classes) 32 if not isinstance(orig_imgs, list): # input images are a torch.Tensor, not a list 33 orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\utils\ops.py:242, in non_max_suppression(prediction, conf_thres, iou_thres, classes, agnostic, multi_label, labels, max_det, nc, max_time_img, max_nms, max_wh) 240 c = x[:, 5:6] * (0 if agnostic else max_wh) # classes 241 boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores --> 242 i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS 243 i = i[:max_det] # limit detections 245 # # Experimental 246 # merge = False # use merge-NMS 247 # if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean) (...) 254 # if redundant: 255 # i = i[iou.sum(1) > 1] # require redundancy

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchvision\ops\boxes.py:41, in nms(boxes, scores, iou_threshold) 39 _log_api_usage_once(nms) 40 _assert_has_ops() ---> 41 return torch.ops.torchvision.nms(boxes, scores, iou_threshold)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch_ops.py:502, in OpOverloadPacket.call(self, *args, kwargs) 497 def call(self, *args, *kwargs): 498 # overloading call to ensure torch.ops.foo.bar() 499 # is still callable from JIT 500 # We save the function ptr as the op attribute on 501 # OpOverloadPacket to access it here. --> 502 return self._op(args, kwargs or {})

NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

CPU: registered at C:\Users\circleci\project\torchvision\csrc\ops\cpu\nms_kernel.cpp:112 [kernel] QuantizedCPU: registered at C:\Users\circleci\project\torchvision\csrc\ops\quantized\cpu\qnms_kernel.cpp:124 [kernel] BackendSelect: fallthrough registered at ..\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:144 [backend fallback] FuncTorchDynamicLayerBackMode: registered at ..\aten\src\ATen\functorch\DynamicLayer.cpp:491 [backend fallback] Functionalize: registered at ..\aten\src\ATen\FunctionalizeFallbackKernel.cpp:280 [backend fallback] Named: registered at ..\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ..\aten\src\ATen\ConjugateFallback.cpp:17 [backend fallback] Negative: registered at ..\aten\src\ATen\native\NegateFallback.cpp:19 [backend fallback] ZeroTensor: registered at ..\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback] ADInplaceOrView: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:63 [backend fallback] AutogradOther: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:30 [backend fallback] AutogradCPU: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:34 [backend fallback] AutogradCUDA: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:42 [backend fallback] AutogradXLA: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:46 [backend fallback] AutogradMPS: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:54 [backend fallback] AutogradXPU: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:38 [backend fallback] AutogradHPU: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:67 [backend fallback] AutogradLazy: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:50 [backend fallback] AutogradMeta: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:58 [backend fallback] Tracer: registered at ..\torch\csrc\autograd\TraceTypeManual.cpp:294 [backend fallback] AutocastCPU: fallthrough registered at ..\aten\src\ATen\autocast_mode.cpp:487 [backend fallback] AutocastCUDA: fallthrough registered at ..\aten\src\ATen\autocast_mode.cpp:354 [backend fallback] FuncTorchBatched: registered at ..\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:815 [backend fallback] FuncTorchVmapMode: fallthrough registered at ..\aten\src\ATen\functorch\VmapModeRegistrations.cpp:28 [backend fallback] Batched: registered at ..\aten\src\ATen\LegacyBatchingRegistrations.cpp:1073 [backend fallback] VmapMode: fallthrough registered at ..\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback] FuncTorchGradWrapper: registered at ..\aten\src\ATen\functorch\TensorWrapper.cpp:210 [backend fallback] PythonTLSSnapshot: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:152 [backend fallback] FuncTorchDynamicLayerFrontMode: registered at ..\aten\src\ATen\functorch\DynamicLayer.cpp:487 [backend fallback] PythonDispatcher: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:148 [backend fallback]

Environment

OS: windows 11 Env: jupyter Python: 3.11.3 torch: 2.0.1+cu11.8 RAM: 16.00 GB VRAM: 4.0 GB

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 9 months ago

πŸ‘‹ Hello @chri002, thank you for your interest in YOLOv8 πŸš€! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a πŸ› Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

chri002 commented 9 months ago

I solved it, the problem was python 3.11, I downgrade python to 3.10 and it works.

pderrenger commented 8 months ago

@chri002 glad to hear that you were able to resolve the issue! It's great to know that downgrading Python to 3.10 solved the problem for you. If you have any further questions or need assistance with anything else related to YOLOv8 or any other Ultralytics models, feel free to ask. Happy coding!

S-Gaurisankar commented 8 months ago

I had the same error but the fix was changing the torchvision version. A simple torchvision.__version__ revealed that the installed version was the cpu version (0.15.2+cpu) and not the gpu version(ends with +cu118/cu117).

glenn-jocher commented 8 months ago

@S-Gaurisankar it's wonderful that you've pinpointed the mismatch in torchvision version as it relates to the CUDA backend! Indeed, ensuring that the installed torchvision version corresponds with the CUDA version being used is crucial for GPU-enabled capabilities.

To anyone encountering similar issues, it's important to verify that their environment is set up with compatible versions of PyTorch, torchvision, and the CUDA toolkit. Misalignment between these components can lead to errors, such as the one previously experienced.

Thank you for sharing your solution with the community, as it can certainly assist others in troubleshooting similar problems. If further assistance is required, or if you have additional insights or questions about using YOLOv8, don't hesitate to reach out. Happy to help, and keep up the great work! πŸš€

StrangeAlex commented 4 months ago

I had the same error but the fix was changing the torchvision version. A simple torchvision.__version__ revealed that the installed version was the cpu version (0.15.2+cpu) and not the gpu version(ends with +cu118/cu117).

Could you please explain how exactly did you change cpu version to gpu? I have the same problem, but I struggle with changing the version as well :(

Thanks!

S-Gaurisankar commented 4 months ago

Could you please explain how exactly did you change cpu version to gpu? I have the same problem, but I struggle with changing the version as well :(

Sure, I removed torchvision and installed the version with CUDA 11.8 support. To be more specific, I used pip install torchvision --index-url https://download.pytorch.org/whl/cu118 to get the GPU version. Do refer to the official pytorch webpage to get the appropriate dependency versions.

StrangeAlex commented 4 months ago

Could you please explain how exactly did you change cpu version to gpu? I have the same problem, but I struggle with changing the version as well :(

Sure, I removed torchvision and installed the version with CUDA 11.8 support. To be more specific, I used pip install torchvision --index-url https://download.pytorch.org/whl/cu118 to get the GPU version. Do refer to the official pytorch webpage to get the appropriate dependency versions.

Thank you for the quick answer! Unfortunately, this didn't quite work for me... Cuda version of torchvision is installed, I can see it with "pip list", but every time I import this package, the version is for cpu... But thanks anyway :)

glenn-jocher commented 4 months ago

@StrangeAlex it sounds like there might be a mix-up in your environment. 😊 Try completely uninstalling both PyTorch and torchvision first, and then reinstall them ensuring you're specifying the correct CUDA version. Here's how you can do it:

pip uninstall torch torchvision
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118

Make sure to replace cu118 with the CUDA version that matches your system. This should help ensure that both PyTorch and torchvision are aligned with your GPU's CUDA version. If the issue persists, double-check your CUDA installation and environment paths. Good luck!

StrangeAlex commented 4 months ago

@StrangeAlex it sounds like there might be a mix-up in your environment. 😊 Try completely uninstalling both PyTorch and torchvision first, and then reinstall them ensuring you're specifying the correct CUDA version. Here's how you can do it:

pip uninstall torch torchvision
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118

Make sure to replace cu118 with the CUDA version that matches your system. This should help ensure that both PyTorch and torchvision are aligned with your GPU's CUDA version. If the issue persists, double-check your CUDA installation and environment paths. Good luck!

Thank you! I don’t know exactly what I did (I tried uninstalling and installing torch packages on different environments), but now torchvision is using gpu as well, and the training goes 90x faster :) Maybe the problem was in cuda version I was using (seems like it works with 12.2)… Thanks again for the help, everyone! Love this community :)

glenn-jocher commented 4 months ago

That's fantastic news, @StrangeAlex! πŸŽ‰ I'm thrilled to hear that your training speed has significantly improved. It sounds like you've navigated through the complexities of environment setup like a pro! Sometimes, a bit of trial and error with versions and environments does the trick. πŸ˜„

If you ever bump into more questions or need further assistance, you know where to find us. Happy training, and here's to many more successes with your projects! Cheers to the awesome community here! πŸš€