OBB model on triton server cause NMS error (probably not evaulated with task='obb' but task='detect')

wojciechpolchlopek commented 6 months ago

Search before asking

[X] I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Integrations

Bug

The local evaluation is correct on torchscript model but on triton seems to evaluate on task 'detect' instead of OBB I trained two models: with task='obb'

nc = 1 -> exported size (1, 6, 8400)
nc =2 -> exported size (1, 7, 8400) --local evaluation on exported torchscript model is correct -> correctly detected two rotated frames (with high score) using

model = YOLO(f'best.torchscript', task='obb') --evaluaion on triton caused RuntimeError: Trying to create tensor with negative dimension -991: [0, -991] - for

model = YOLOOBBWrapper(f'http://localhost:8000/yolo_obb_1', task='obb') where:

    class YOLOOBBWrapper:
    def __init__(self, model_url):
        self.model = YOLO(model_url, task='obb')

    def predict(self, image_path):
        return self.model(image_path)

--raw evaluation with `triton_client.async_infer `and

 ultralytics.utils.ops.non_max_suppression(
    data,
    conf_thres=0.5,
    iou_thres=0.4,
    nc=2,
    rotated=True,
    agnostic=True ) ```

 -> I got only one rotated frame with low score

### Environment

Ultralytics YOLOv8.2.1 🚀 Python-3.9.19 torch-2.3.0+cu121 CPU (Intel Core(TM) i7-10870H 2.20GHz)
Setup complete ✅ (16 CPUs, 31.2 GB RAM, 691.3/697.5 GB disk)

OS                  Linux-5.15.0-105-generic-x86_64-with-glibc2.31
Environment         Linux
Python              3.9.19
Install             git
RAM                 31.16 GB
CPU                 Intel Core(TM) i7-10870H 2.20GHz
CUDA                None

matplotlib          ✅ 3.8.4>=3.3.0
opencv-python       ✅ 4.7.0.72>=4.6.0
pillow              ✅ 9.2.0>=7.1.2
pyyaml              ✅ 6.0.1>=5.3.1
requests            ✅ 2.31.0>=2.23.0
scipy               ✅ 1.13.0>=1.4.1
torch               ✅ 2.3.0>=1.8.0
torchvision         ✅ 0.18.0>=0.9.0
tqdm                ✅ 4.66.2>=4.64.0
psutil              ✅ 5.9.8
py-cpuinfo          ✅ 9.0.0
thop                ✅ 0.1.1-2209072238>=0.1.1
pandas              ✅ 2.2.2>=1.1.4
seaborn             ✅ 0.13.2>=0.11.0

Triton server config:
`docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/wojpol/models:/models nvcr.io/nvidia/tritonserver:23.09-py3 tritonserver --model-repository=/models
`
config.pbtxt:
`platform: "pytorch_libtorch"
max_batch_size: 10
input [
  {
    name: "inputs__0"
    data_type: TYPE_FP32
    dims: [ 3, 640, 640 ]
  }
]
output [
  {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [-1]
  }
]
`

### Minimal Reproducible Example

`yolo export model=best-obb.pt format='torchscript' task='obb' imgsz=640`
`model = YOLOOBBWrapper(f'http://localhost:8000/yolo_obb_1', task='obb')` where:

class YOLOOBBWrapper:
def __init__(self, model_url):
    self.model = YOLO(model_url, task='obb')

def predict(self, image_path):
    return self.model(image_path)



### Additional

_No response_

### Are you willing to submit a PR?

- [ ] Yes I'd like to help by submitting a PR!

glenn-jocher commented 6 months ago

Hi! Thanks for reaching out with the details on your issue integrating the OBB model with the Triton server.

Based on your description, it sounds like there might be a mismatch in tensor dimensions or configuration between the TorchScript model and how Triton is set up to handle it, especially considering the negative dimension error.

Here’s a quick suggestion:

Please double-check the setup in your config.pbtxt for Triton, ensuring that output dimensions align properly with your expected model outputs. For OBB tasks, dimensions might differ compared to standard detection models.
Another aspect to consider is ensuring that the Triton server is correctly interpreting the task type as 'obb' rather than defaulting to 'detect'. You might need to explicitly define this behavior in your server model loading script or environment.

Here's a slightly adjusted snippet for instantiating your model:

class YOLOOBBWrapper:
    def __init__(self, model_url, task='obb'):
        self.model = YOLO(model_url, task=task)

    def predict(self, image_path):
        return self.model(image_path)

We are indeed here to help further if this adjustment doesn’t resolve the issue! 🌟

wojciechpolchlopek commented 6 months ago

Hi, I have found a reason in https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/yolo/obb/predict.py#L38 nc=len(self.model.names) is a large number - correcting this to real nc value (e.g. 2) leads to correct results. My question is: where model.names should be set to have correct length?

glenn-jocher commented 6 months ago

Hi! Great job diving into the code and identifying the workaround! 🌟

The model.names should reflect the class names found in your dataset. Typically, this is set when you load your model using the dataset's YAML file, where the number of classes (nc) and their respective names are specified.

If you are directly loading a model without associating it with a specific data YAML, you can manually adjust model.names after loading your model as follows:

from ultralytics import YOLO

# Load your model
model = YOLO('your_model.pt')

# Set the correct class names
model.names = ['class1', 'class2']  # Update this list with your actual class names

Ensure the names are consistent with the nc value and your class labels. That should align everything correctly! Let us know if this helps or if you need any more details!

wojciechpolchlopek commented 6 months ago

Thank for your help. As a the workaround I finally used nc = detection.shape[1] - 5 for nms algorithm and my custom triton evaluation code with triton_client.async_infer works fine :) The Issue could be closed, but... Please consider this as a potential bug because exported model has proper model.names in the inner config.txt e.g. for nc=2: {"description": "Ultralytics YOLOv8m-obb model trained on config.yaml", "author": "Ultralytics", "date": "2024-05-10T07:38:22.845204", "version": "8.2.1", "license": "AGPL-3.0 License (https://ultralytics.com/license)", "docs": "https://docs.ultralytics.com", "stride": 32, "task": "obb", "batch": 1, "imgsz": [640, 640], "names": {"0": "var", "1": "dontcare"}}

glenn-jocher commented 6 months ago

Hi there! 👋 Great to hear you've found a workaround that suits your needs for now. We appreciate you sharing it!

Thanks also for pointing out this discrepancy with how model.names is being handled for your use case. That's definitely something we'll consider investigating deeper to ensure consistency and correctness in exported model configurations. I’ll pass your feedback to our team.

For now, don't hesitate to reach out if you encounter any other issues or have further suggestions. Thank you for contributing to the YOLOv8 community by sharing these insights! 🚀

ultralytics / ultralytics

OBB model on triton server cause NMS error (probably not evaulated with task='obb' but task='detect') #11757

Search before asking

YOLOv8 Component

Bug