ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
26.64k stars 5.3k forks source link

Automatic box coords in json as in YOLOv5 #14243

Open AlexPasqua opened 2 weeks ago

AlexPasqua commented 2 weeks ago

Search before asking

Description

In YOLOv5, you could have the boxes' coordinates in dataframe format with a simple results.pandas().xyxy[0], and then get them in json by simply adding .to_json() at the end. Returning the coordinates in json format is usually needed in the super common use-case where the model is deployed and accessed through an API.

As discussed in #8235 (this comment specifically), this feature could benefit many users! 😄

Use case

Get the coordinates in json format with something like results.pandas().xyxy[0].to_json() instead of what's curerntly needed, i.e.:

results = model.predict(image)
boxes = results[0].boxes.xyxy.cpu().numpy()
scores = results[0].boxes.conf.cpu().numpy()
classes = results[0].boxes.cls.cpu().numpy()
json_outout = []
for box, score, cls in zip(boxes, scores, classes):
    json_outout .append({
        'x1': float(box[0]),
        'y1': float(box[1]),
        'x2': float(box[2]),
        'y2': float(box[3]),
        'confidence': float(score),
        'class': int(cls)
    })

Are you willing to submit a PR?

AlexPasqua commented 2 weeks ago

I'd like to submit a PR, so I would discuss here how to implement this feature.

From what I've seen, in YOLOv5, the output of model.predict(...) was an object of type Detections, while in YOLOv8 it's a simple PyTorch Tensor (torch.Tensor). Correct me if I'm wrong. Since we cannot simply implement a new pandas() method in the torch.Tensor class, as we would do with a custom class like Detections, I'd propose two alternatives.

Alternative 1

Create a new method in YOLO(Model) that takes the model's output and returns what pandas().xyxy[0] would return.

Minimum example:

model = YOLO('yolov8n.pt')
results = model.predict(image)

# extract the coordinates from the results as YOLOv5's results.pandas().xyxy[0]
coords = model.extract_coords(results)

Alternative 2

Make the model return a custom object as output instead of a torch.Tensor. I would be similar to having a Detections class again, so we would have the freedom to implement our custom methods directly in the class representing the model's output.

glenn-jocher commented 2 weeks ago

@AlexPasqua thank you for your willingness to contribute and for outlining your proposed solutions! Your initiative is greatly appreciated by the YOLO community and the Ultralytics team. Let's delve into your suggestions:

Alternative 1: Method in YOLO(Model)

Creating a new method in the YOLO class to extract coordinates from the model's output is a practical approach. This method could transform the torch.Tensor output into a more user-friendly format, similar to the pandas().xyxy[0] in YOLOv5. Here's a concise example of how this could be implemented:

class YOLO:
    # Existing methods...

    def extract_coords(self, results):
        coords = []
        for box, score, cls in zip(results.boxes, results.scores, results.classes):
            coords.append({
                'x1': float(box[0]),
                'y1': float(box[1]),
                'x2': float(box[2]),
                'y2': float(box[3]),
                'confidence': float(score),
                'class': int(cls)
            })
        return coords

# Usage
model = YOLO('yolov8n.pt')
results = model.predict(image)
coords = model.extract_coords(results)

Alternative 2: Custom Output Object

Returning a custom object instead of a torch.Tensor would indeed provide more flexibility. This approach aligns with the design of YOLOv5's Detections class, allowing for the implementation of custom methods directly within the output class. This could look something like:

class CustomDetections:
    def __init__(self, boxes, scores, classes):
        self.boxes = boxes
        self.scores = scores
        self.classes = classes

    def pandas(self):
        import pandas as pd
        data = {
            'x1': self.boxes[:, 0],
            'y1': self.boxes[:, 1],
            'x2': self.boxes[:, 2],
            'y2': self.boxes[:, 3],
            'confidence': self.scores,
            'class': self.classes
        }
        return pd.DataFrame(data)

class YOLO:
    # Existing methods...

    def predict(self, image):
        # Perform prediction...
        boxes, scores, classes = self.model(image)
        return CustomDetections(boxes, scores, classes)

# Usage
model = YOLO('yolov8n.pt')
results = model.predict(image)
coords_df = results.pandas().xyxy[0]

Next Steps

Both alternatives have their merits. The first is simpler and less intrusive, while the second offers greater flexibility and aligns more closely with the design philosophy of YOLOv5.

Feel free to choose the approach that best fits your vision and submit a PR. The community and the Ultralytics team will be happy to review and provide feedback. If you need any further assistance or have more questions, don't hesitate to ask!

Thank you again for your contribution! 🚀

For more detailed guidance on contributing, you can refer to our Contributing Guide.

Y-T-G commented 2 weeks ago

That's already available. result.boxes.xyxy

AlexPasqua commented 2 weeks ago

Aim of the feature request

That's already available. result.boxes.xyxy

Hi @Y-T-G, the focus of this feature request is actually reducing the amount of manual code when returning the detections results in a setting where the model is deployed somewhere and accessed through an API (e.g., using FastAPI). In this case, in fact, you still need to re-elaborate what result[0].boxes.xyxy gives you.

What's currently needed:

import cv2
import numpy as np
from ultralytics import YOLO
from fastapi import FastAPI, File, UploadFile

app = FastAPI()
model = YOLO('yolov8n.pt')  # Load model at startup

@app.post("/detect")
async def detect(file: UploadFile):
    # Process the uploaded image for object detection
    image_bytes = await file.read()
    image = np.frombuffer(image_bytes, dtype=np.uint8)
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)

    # Perform object detection with YOLOv8
    results = model.predict(image)

    # Extract bounding box data
    boxes = results[0].boxes.xyxy.cpu().numpy()
    scores = results[0].boxes.conf.cpu().numpy()
    classes = results[0].boxes.cls.cpu().numpy()

    # Format the results as a list of dictionaries
    json_output = []
    for box, score, cls in zip(boxes, scores, classes):
        json_output.append({
            'x1': box[0],
            'y1': box[1],
            'x2': box[2],
            'y2': box[3],
            'confidence': score,
            'class': int(cls)
        })

    return json_output

What this feature requests aims for:

import cv2
import numpy as np
from ultralytics import YOLO
from fastapi import FastAPI, File, UploadFile

app = FastAPI()
model = YOLO('yolov8n.pt')  # Load model at startup

@app.post("/detect")
async def detect(file: UploadFile):
    # Process the uploaded image for object detection
    image_bytes = await file.read()
    image = np.frombuffer(image_bytes, dtype=np.uint8)
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)

    # Perform object detection with YOLOv8
    results = model.predict(image)

    # Extract bounding box data as a pandas Dataframe and use pandas' "to_json" function
    json_output = results.pandas('xyxy').to_json()

    return json_output

Maybe the title or description of the issue weren't too clear, but if you look at the discussion linked above (#8235 and this comment) you could get more context 😃

Proposed approach

Selection of the alternative

From what I've seen, in YOLOv5, the output of model.predict(...) was an object of type Detections, while in YOLOv8 it's a simple PyTorch Tensor (torch.Tensor). Correct me if I'm wrong.

In the end I was wrong indeed. The output in YOLOv8 is actually a custom object (like YOLOv5's Detections) called Results, so I would opt for the Alternative 2.

Since the output is actually a custom object (Results), I would add methods there.

Current situation:

What I would do

I would create a method Results.pandas, which elaborates the data into boxes and returns a pandas Dataframe where:

Once we have this dataframe, we could use pandas' to_json() method to get the results in json format, directly returnable by out API 😄

Check the above code snipped in this message (What this feature request aims for) for a contextualized example.

This way, we can have the output in a dataframe format, which might be useful for various use-cases, and if we need the outoput in json (e.g., in the super common use-case where the model is deployed and accessed through an API), we can pass by the .pandas() method and then use pandas' to_json() method without the need to implement something more.

Or something like that... @glenn-jocher let me know what you think 😄

glenn-jocher commented 2 weeks ago

Hi @AlexPasqua-G,

Thank you for your input! The feature request aims to streamline the process of converting detection results into a JSON format, which is particularly useful when deploying models via APIs, such as with FastAPI.

Current Workflow

Currently, extracting and formatting the detection results involves several manual steps, as shown in the provided example. This process can be cumbersome, especially when frequently deploying models in production environments.

Proposed Enhancement

The goal is to simplify this workflow by introducing a method that directly converts the detection results into a pandas DataFrame, which can then be easily converted to JSON. This would reduce the amount of boilerplate code and make the deployment process more efficient.

Implementation Plan

Given that the output in YOLOv8 is a custom Results object, we can add a method to this class to facilitate the conversion. Here's a concise plan:

  1. Add a pandas Method to Results Class:

    • This method will convert the detection results into a pandas DataFrame.
    • It will accept an argument to specify the format of the bounding box coordinates (e.g., xyxy, xywh, etc.).
  2. Usage Example:

    import cv2
    import numpy as np
    from ultralytics import YOLO
    from fastapi import FastAPI, File, UploadFile
    
    app = FastAPI()
    model = YOLO('yolov8n.pt')  # Load model at startup
    
    @app.post("/detect")
    async def detect(file: UploadFile):
       # Process the uploaded image for object detection
       image_bytes = await file.read()
       image = np.frombuffer(image_bytes, dtype=np.uint8)
       image = cv2.imdecode(image, cv2.IMREAD_COLOR)
    
       # Perform object detection with YOLOv8
       results = model.predict(image)
    
       # Extract bounding box data as a pandas DataFrame and convert to JSON
       json_output = results.pandas('xyxy').to_json()
    
       return json_output

Benefits

This enhancement will make it more convenient for users to deploy YOLOv8 models in real-world applications, particularly those involving APIs. If you have any further suggestions or feedback, please let us know! 😊

AlexPasqua commented 2 weeks ago

Alright @glenn-jocher, then I'll proceed to open a PR about it 🚀

glenn-jocher commented 2 weeks ago

Hi @AlexPasqua,

That sounds fantastic! 🚀 We're excited to see your contribution. When you're ready, please go ahead and submit the PR. If you need any assistance or have further questions during the process, feel free to reach out here. Your efforts to enhance the usability of YOLOv8 are greatly appreciated by the community and the Ultralytics team. Thank you! 😊

Best of luck with the PR, and we're looking forward to reviewing it!