Error when running fine-tuned segmentation model on Triton Inference server : Trying to create tensor with negative dimension -23: [0, -23]

atheer174 commented 3 months ago

Search before asking

[X] I have searched the Ultralytics YOLO issues and found no similar bug report.

Ultralytics YOLO Component

No response

Bug

Hi, I'm facing the same issue that @songjiahao-wq had in #13548 , however, it happened to me when running my custom model using Triton Inference server. I managed to solve it by changing the DEFAULT_CFG_PATH at init.py manually to be directed to my config file DEFAULT_CFG_PATH = ROOT / "/Users/atheer/Desktop/repos/AI/runs/segment/train5/args.yaml"

this is the error I got: File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/utils/ops.py", line 244, in non_max_suppression output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs RuntimeError: Trying to create tensor with negative dimension -23: [0, -23]

BTW I am only facing this issue with Triton, regular usage of my custom model is fine, please help me with this because I'm planning to use more than one segmentation model, so it's not efficient to change the default config path to a static one.

thanks

Environment

Ultralytics YOLOv8.2.67 🚀 Python-3.10.10 torch-2.0.1 CPU (Apple M2 Pro) Setup complete ✅ (10 CPUs, 16.0 GB RAM, 378.0/460.4 GB disk)

OS macOS-13.2.1-arm64-arm-64bit Environment Darwin Python 3.10.10 Install pip RAM 16.00 GB CPU Apple M2 Pro CUDA None

numpy ✅ 1.23.5<2.0.0,>=1.23.0 matplotlib ✅ 3.7.2>=3.3.0 opencv-python ✅ 4.7.0.72>=4.6.0 pillow ✅ 10.2.0>=7.1.2 pyyaml ✅ 6.0>=5.3.1 requests ✅ 2.31.0>=2.23.0 scipy ✅ 1.11.2>=1.4.1 torch ✅ 2.0.1>=1.8.0 torchvision ✅ 0.15.2>=0.9.0 tqdm ✅ 4.65.0>=4.64.0 psutil ✅ 5.9.0 py-cpuinfo ✅ 9.0.0 pandas ✅ 2.0.3>=1.1.4 seaborn ✅ 0.13.2>=0.11.0 ultralytics-thop ✅ 2.0.0>=2.0.0

Minimal Reproducible Example

This is my Trition client script:

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 3 months ago

👋 Hello @atheer174, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Y-T-G commented 3 months ago

Which model is this?

atheer174 commented 3 months ago

Which model is this?

I fine-tuned yolov8n-seg.pt

pderrenger commented 3 months ago

@atheer174 thank you for the information. Please ensure you're using the latest version of the Ultralytics YOLO package. If the issue persists, it might be related to Triton's handling of the model. You could try exporting the model to a different format, such as ONNX, and see if that resolves the issue. If you continue to face problems, please provide more details about your Triton configuration and any additional error logs.

atheer174 commented 3 months ago

@atheer174 thank you for the information. Please ensure you're using the latest version of the Ultralytics YOLO package. If the issue persists, it might be related to Triton's handling of the model. You could try exporting the model to a different format, such as ONNX, and see if that resolves the issue. If you continue to face problems, please provide more details about your Triton configuration and any additional error logs.

Thank you for your reply, I upgraded to the latest version and still facing the same issue even tho I'm using ONNX format.

BTW as I mentioned in the issue, using a segmentation model without fine-tuning works fine, the problem is when using a custom model.

this is the error I'm getting now:

Traceback (most recent call last):
  File "/Users/atheer/Desktop/repos/AI-client/yolo_client.py", line 20, in <module>
    results = model(["./image/0cd8f14b-7f12-451a-b07c-8abd00c0afde.jpeg"])  
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 173, in __call__
    return self.predict(source, stream, **kwargs)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 563, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 168, in __call__
    return list(self.stream_inference(source, model, *args, **kwargs))  # merge list of Result into one
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 261, in stream_inference
    self.results = self.postprocess(preds, im, im0s)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/models/yolo/segment/predict.py", line 30, in postprocess
    p = ops.non_max_suppression(
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/utils/ops.py", line 244, in non_max_suppression
    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
RuntimeError: Trying to create tensor with negative dimension -942: [0, -942]

pderrenger commented 3 months ago

Thank you for the update. Since you're experiencing this issue with a fine-tuned segmentation model on Triton, even after upgrading to the latest version and using the ONNX format, it suggests there might be a specific compatibility issue with the custom model.

Given that the problem does not occur with the base segmentation model, it could be related to the fine-tuning process or the specific configurations used. Please ensure that the model's configuration file and all dependencies are correctly set up in Triton. Additionally, check if the issue persists when using a different export format like TensorRT.

If the problem continues, please share more details about your Triton setup and any additional error logs. This information will help in diagnosing the issue more effectively.

atheer174 commented 3 months ago

The fine-tuned segmentation model is fine when I use it directly without Triton, it produces the expected output, I'm only facing that error when trying to infer with Triton.

I explained in the issue how I changed the config file in the Ultralytics package and that solved the problem temporarily, however, it's not practical to hard code the config file path in the package scripts.

I can't use TensorRT on macOS

Y-T-G commented 3 months ago

Can you post the full traceback of the error?

atheer174 commented 3 months ago

Can you post the full traceback of the error?

(base) atheer@Atheers-MacBook-Pro-3 AI-client % python yolo_client.py

Traceback (most recent call last): File "/Users/atheer/Desktop/repos/AI-client/yolo_client.py", line 20, in results = model(["./image/0cd8f14b-7f12-451a-b07c-8abd00c0afde.jpeg"])
File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 173, in call return self.predict(source, stream, *kwargs) File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 563, in predict return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream) File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 168, in call return list(self.stream_inference(source, model, args, *kwargs)) # merge list of Result into one File "/Users/atheer/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 261, in stream_inference self.results = self.postprocess(preds, im, im0s) File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/models/yolo/segment/predict.py", line 30, in postprocess p = ops.non_max_suppression( File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/utils/ops.py", line 244, in non_max_suppression output = [torch.zeros((0, 6 + nm), device=prediction.device)] bs RuntimeError: Trying to create tensor with negative dimension -942: [0, -942]

Y-T-G commented 3 months ago

What model is this? YOLOv10?

atheer174 commented 3 months ago

What model is this? YOLOv10?

YOLOv8

pderrenger commented 3 months ago

Thank you for confirming it's YOLOv8. Given that the issue occurs only with Triton and not during direct usage, it suggests a potential compatibility issue with Triton's handling of the fine-tuned model. Please ensure you're using the latest versions of both the Ultralytics package and Triton. If the issue persists, consider sharing more details about your Triton configuration and any additional logs. This will help in diagnosing the problem more effectively.

Y-T-G commented 3 months ago

You can try this:

#### Run only once after loading model ####
# Define number of classes in your model
NUM_CLASSES = 80
model.predictor = model._smart_load("predictor")(overrides=model.overrides, _callbacks=model.callbacks)
model.predictor.model.names = {i:i for i in range(NUM_CLASSES)}
################################

results = model(...)

atheer174 commented 2 months ago

thank you @Y-T-G for the PR, after updating the package to the latest version, I got this:



Traceback (most recent call last):
  File "/Users/atheer/Desktop/repos/AI-client/yolo_client.py", line 20, in <module>
    results = model(["./image/0cd8f14b-7f12-451a-b07c-8abd00c0afde.jpeg"])  
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 173, in __call__
    return self.predict(source, stream, **kwargs)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 563, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 168, in __call__
    return list(self.stream_inference(source, model, *args, **kwargs))  # merge list of Result into one
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 234, in stream_inference
    self.model.warmup(imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, 3, *self.imgsz))
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/nn/autobackend.py", line 639, in warmup
    self.forward(im)  # warmup
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/nn/autobackend.py", line 606, in forward
    if len(self.names) == 999 and (self.task == "segment" or len(y) == 2):  # segments and names not defined
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'AutoBackend' object has no attribute 'task'

atheer174 commented 2 months ago

BTW I get an output when I change the path of the config file to the costume model config, please check the attached image:

Y-T-G commented 2 months ago

@atheer174 Can you try removing the self.task == segment?

Y-T-G commented 2 months ago

Can you also try setting task manually after loading the model

model.task = "segment"

atheer174 commented 2 months ago

I removed self.task == segment from autobackend.py here:

and set task in tasks.py here:

and got this:


Traceback (most recent call last):
  File "/Users/atheer/Desktop/repos/AI-client/yolo_client.py", line 20, in <module>
    results = model(["./image/0cd8f14b-7f12-451a-b07c-8abd00c0afde.jpeg"])  
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 173, in __call__
    return self.predict(source, stream, **kwargs)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 563, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 168, in __call__
    return list(self.stream_inference(source, model, *args, **kwargs))  # merge list of Result into one
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 261, in stream_inference
    self.results = self.postprocess(preds, im, im0s)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/models/yolo/segment/predict.py", line 52, in postprocess
    masks = ops.process_mask(proto[i], pred[:, 6:], pred[:, :4], img.shape[2:], upsample=True)  # HWC
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/utils/ops.py", line 673, in process_mask
    masks = (masks_in @ protos.float().view(c, -1)).view(-1, mh, mw)  # CHW
RuntimeError: mat1 and mat2 shapes cannot be multiplied (300x0 and 32x25600)

chuan298 commented 2 months ago

I removed self.task == segment from autobackend.py here: Screenshot 2024-08-14 at 4 52 16 PM

and set task in tasks.py here: Screenshot 2024-08-14 at 4 52 27 PM

and got this:

Traceback (most recent call last):
  File "/Users/atheer/Desktop/repos/AI-client/yolo_client.py", line 20, in <module>
    results = model(["./image/0cd8f14b-7f12-451a-b07c-8abd00c0afde.jpeg"])  
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 173, in __call__
    return self.predict(source, stream, **kwargs)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/model.py", line 563, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 168, in __call__
    return list(self.stream_inference(source, model, *args, **kwargs))  # merge list of Result into one
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 261, in stream_inference
    self.results = self.postprocess(preds, im, im0s)
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/models/yolo/segment/predict.py", line 52, in postprocess
    masks = ops.process_mask(proto[i], pred[:, 6:], pred[:, :4], img.shape[2:], upsample=True)  # HWC
  File "/Users/atheer/miniconda3/lib/python3.10/site-packages/ultralytics/utils/ops.py", line 673, in process_mask
    masks = (masks_in @ protos.float().view(c, -1)).view(-1, mh, mw)  # CHW
RuntimeError: mat1 and mat2 shapes cannot be multiplied (300x0 and 32x25600)

Hi @atheer174 , I also get same error. Have you fixed it yet?

chuan298 commented 2 months ago

Hi @atheer174 , I just fixed the error by passing the data="your_config.yaml"when predicting the model. The config contains names key representing your custom classes. For example my yaml config:

names:
  0: A
  1: B
  2: C
  3: D

atheer174 commented 2 months ago

Hi @atheer174 , I just fixed the error by passing the data="your_config.yaml"when predicting the model. The config contains names key representing your custom classes. For example my yaml config:
names:
  0: A
  1: B
  2: C
  3: D

@chuan298 Thank you! That worked 👍

ultralytics / ultralytics