Results differ when using cv2 vs pillow

robmarkcole commented 2 months ago

Search before asking

[X] I have searched the Supervision issues and found no similar bug report.

Bug

From this comment, I understand supervision doesn't change channel order, and the issue I highlight here is likely addressed by documentation. I observe that if I open an image with cv2 or with pillow the predictions are different. The model was trained using ultraltics which I believe also uses cv2, so when I use pillow the channels order is changed. I suggest adding a note to the docs to check which library was used in training, then use that with supervision. Comparisons below:

cv2:

pillow: Pasted Graphic 5

Environment

No response

Minimal Reproducible Example

image_path = "my.png"

# change
image = cv2.imread(image_path)
# image = np.array(Image.open(image_path))

def callback(image_slice: np.ndarray) -> sv.Detections:
    result = model(image_slice)[0]
    return sv.Detections.from_ultralytics(result)

slicer = sv.InferenceSlicer(callback = callback)
detections = slicer(image)
detections = detections[detections.class_id == 1]

box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

annotated_image = box_annotator.annotate(scene=image, detections=detections)
annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections)

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

robmarkcole commented 2 months ago

Reading https://github.com/ultralytics/ultralytics/issues/9912 I suspect the issue is that the docs should include the conversion im_rgb = cv2.cvtColor(im_bgr, cv2.COLOR_BGR2RGB)

SkalskiP commented 2 months ago

Hi @robmarkcole 👋🏻 Long time no see. Correct me if I'm wrong, but I'm not sure this is a problem with Supervision. Constructing a correct callback is the responsibility of the user.

In our documentation, we showcase an example using the ultralytics library, demonstrating image loading with OpenCV. Have you found an example using Pillow in our materials?

Screenshot 2024-07-23 at 09 18 35

I recommend using sv.cv2_to_pillow and sv.pillow_to_cv2. These two methods take care of channel order conversion.

robmarkcole commented 2 months ago

Right that is the example I followed. I agree there's a limit in what to cover in examples, and you don't want to become a cv2 tutorial. But I think updating this example to correctly handle the channel order could save other people from doing this investigation. I only use cv2 occasionally and every time have to re remember this channels issue 😀

SkalskiP commented 2 months ago

@robmarkcole, there are two problems I see:

We support multiple models. Some models expect np.array in BGR format, some np.array in RGB format, and some PIL.Image in RGB format.
We have multiple code snippets where we show different tools we offer. InferenceSlicer is only one of them.

Showing how to execute each of those code snippets with both np.array and PIL.Image will be a lot of work and a lot of maintenance. I recommend simply following code snippets we offer.

robmarkcole commented 2 months ago

Understood, many thanks for the context

SkalskiP commented 2 months ago

@robmarkcole I'm sorry if I disappointed you with my response.

050603 commented 2 months ago

@robmarkcole Hello, I saw that you successfully used SAHI for OBB testing. May I ask how you implemented it using code? When I followed the code you provided for detection, the coordinates of the four corners of the detection box in the saved CSV file did not appear to be calculated based on the size of the entire image, but rather in units of sliced (640). Did you make any modifications in other parts?

import cv2
import numpy as np
import supervision as sv
from ultralytics import YOLO
from PIL import Image

model = YOLO("/root/autodl-tmp/code/ultralytics/runs/obb/train/weights/best.pt")

image_path = "/root/autodl-tmp/DOTA/images/test/P1068.png"
image = np.array(Image.open(image_path))

def callback(image_slice: np.ndarray) -> sv.Detections:
result = model(image_slice)[0]
return sv.Detections.from_ultralytics(result)

slicer = sv.InferenceSlicer(callback = callback)
detections = slicer(image)

oriented_box_annotator = sv.OrientedBoxAnnotator()
label_annotator = sv.LabelAnnotator()

annotated_image = oriented_box_annotator.annotate(scene=image, detections=detections)
annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections)

annotated_pil_image = Image.fromarray(annotated_image)
annotated_pil_image.show()

robmarkcole commented 2 months ago

@050603 my code is identical, with addition of

csv_path = image_path.replace('.png', '.csv')

with sv.CSVSink(csv_path) as sink:
    sink.append(detections, {})

df = pd.read_csv(csv_path)
df.head()

The coords in the saved csv are for the full image. You should check the range of saved detections - this is easy once you have the data in a dataframe.

SkalskiP commented 2 months ago

Hi @050603 👋🏻, what version of supervision are you using? The SAHI OBB support is not rolled out yet, so you'd need to install supervision from the source to use it.

pip install git+https://github.com/roboflow/supervision.git

roboflow / supervision