ultralytics / ultralytics

Ultralytics YOLO11 🚀
https://docs.ultralytics.com
GNU Affero General Public License v3.0
32.92k stars 6.34k forks source link

Automatic box coords in json as in YOLOv5 #14243

Open AlexPasqua opened 4 months ago

AlexPasqua commented 4 months ago

Search before asking

Description

In YOLOv5, you could have the boxes' coordinates in dataframe format with a simple results.pandas().xyxy[0], and then get them in json by simply adding .to_json() at the end. Returning the coordinates in json format is usually needed in the super common use-case where the model is deployed and accessed through an API.

As discussed in #8235 (this comment specifically), this feature could benefit many users! 😄

Use case

Get the coordinates in json format with something like results.pandas().xyxy[0].to_json() instead of what's curerntly needed, i.e.:

results = model.predict(image)
boxes = results[0].boxes.xyxy.cpu().numpy()
scores = results[0].boxes.conf.cpu().numpy()
classes = results[0].boxes.cls.cpu().numpy()
json_outout = []
for box, score, cls in zip(boxes, scores, classes):
    json_outout .append({
        'x1': float(box[0]),
        'y1': float(box[1]),
        'x2': float(box[2]),
        'y2': float(box[3]),
        'confidence': float(score),
        'class': int(cls)
    })

Are you willing to submit a PR?

AlexPasqua commented 4 months ago

I'd like to submit a PR, so I would discuss here how to implement this feature.

From what I've seen, in YOLOv5, the output of model.predict(...) was an object of type Detections, while in YOLOv8 it's a simple PyTorch Tensor (torch.Tensor). Correct me if I'm wrong. Since we cannot simply implement a new pandas() method in the torch.Tensor class, as we would do with a custom class like Detections, I'd propose two alternatives.

Alternative 1

Create a new method in YOLO(Model) that takes the model's output and returns what pandas().xyxy[0] would return.

Minimum example:

model = YOLO('yolov8n.pt')
results = model.predict(image)

# extract the coordinates from the results as YOLOv5's results.pandas().xyxy[0]
coords = model.extract_coords(results)

Alternative 2

Make the model return a custom object as output instead of a torch.Tensor. I would be similar to having a Detections class again, so we would have the freedom to implement our custom methods directly in the class representing the model's output.

glenn-jocher commented 4 months ago

@AlexPasqua thank you for your willingness to contribute and for outlining your proposed solutions! Your initiative is greatly appreciated by the YOLO community and the Ultralytics team. Let's delve into your suggestions:

Alternative 1: Method in YOLO(Model)

Creating a new method in the YOLO class to extract coordinates from the model's output is a practical approach. This method could transform the torch.Tensor output into a more user-friendly format, similar to the pandas().xyxy[0] in YOLOv5. Here's a concise example of how this could be implemented:

class YOLO:
    # Existing methods...

    def extract_coords(self, results):
        coords = []
        for box, score, cls in zip(results.boxes, results.scores, results.classes):
            coords.append({
                'x1': float(box[0]),
                'y1': float(box[1]),
                'x2': float(box[2]),
                'y2': float(box[3]),
                'confidence': float(score),
                'class': int(cls)
            })
        return coords

# Usage
model = YOLO('yolov8n.pt')
results = model.predict(image)
coords = model.extract_coords(results)

Alternative 2: Custom Output Object

Returning a custom object instead of a torch.Tensor would indeed provide more flexibility. This approach aligns with the design of YOLOv5's Detections class, allowing for the implementation of custom methods directly within the output class. This could look something like:

class CustomDetections:
    def __init__(self, boxes, scores, classes):
        self.boxes = boxes
        self.scores = scores
        self.classes = classes

    def pandas(self):
        import pandas as pd
        data = {
            'x1': self.boxes[:, 0],
            'y1': self.boxes[:, 1],
            'x2': self.boxes[:, 2],
            'y2': self.boxes[:, 3],
            'confidence': self.scores,
            'class': self.classes
        }
        return pd.DataFrame(data)

class YOLO:
    # Existing methods...

    def predict(self, image):
        # Perform prediction...
        boxes, scores, classes = self.model(image)
        return CustomDetections(boxes, scores, classes)

# Usage
model = YOLO('yolov8n.pt')
results = model.predict(image)
coords_df = results.pandas().xyxy[0]

Next Steps

Both alternatives have their merits. The first is simpler and less intrusive, while the second offers greater flexibility and aligns more closely with the design philosophy of YOLOv5.

Feel free to choose the approach that best fits your vision and submit a PR. The community and the Ultralytics team will be happy to review and provide feedback. If you need any further assistance or have more questions, don't hesitate to ask!

Thank you again for your contribution! 🚀

For more detailed guidance on contributing, you can refer to our Contributing Guide.

Y-T-G commented 4 months ago

That's already available. result.boxes.xyxy

AlexPasqua commented 4 months ago

Aim of the feature request

That's already available. result.boxes.xyxy

Hi @Y-T-G, the focus of this feature request is actually reducing the amount of manual code when returning the detections results in a setting where the model is deployed somewhere and accessed through an API (e.g., using FastAPI). In this case, in fact, you still need to re-elaborate what result[0].boxes.xyxy gives you.

What's currently needed:

import cv2
import numpy as np
from ultralytics import YOLO
from fastapi import FastAPI, File, UploadFile

app = FastAPI()
model = YOLO('yolov8n.pt')  # Load model at startup

@app.post("/detect")
async def detect(file: UploadFile):
    # Process the uploaded image for object detection
    image_bytes = await file.read()
    image = np.frombuffer(image_bytes, dtype=np.uint8)
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)

    # Perform object detection with YOLOv8
    results = model.predict(image)

    # Extract bounding box data
    boxes = results[0].boxes.xyxy.cpu().numpy()
    scores = results[0].boxes.conf.cpu().numpy()
    classes = results[0].boxes.cls.cpu().numpy()

    # Format the results as a list of dictionaries
    json_output = []
    for box, score, cls in zip(boxes, scores, classes):
        json_output.append({
            'x1': box[0],
            'y1': box[1],
            'x2': box[2],
            'y2': box[3],
            'confidence': score,
            'class': int(cls)
        })

    return json_output

What this feature requests aims for:

import cv2
import numpy as np
from ultralytics import YOLO
from fastapi import FastAPI, File, UploadFile

app = FastAPI()
model = YOLO('yolov8n.pt')  # Load model at startup

@app.post("/detect")
async def detect(file: UploadFile):
    # Process the uploaded image for object detection
    image_bytes = await file.read()
    image = np.frombuffer(image_bytes, dtype=np.uint8)
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)

    # Perform object detection with YOLOv8
    results = model.predict(image)

    # Extract bounding box data as a pandas Dataframe and use pandas' "to_json" function
    json_output = results.pandas('xyxy').to_json()

    return json_output

Maybe the title or description of the issue weren't too clear, but if you look at the discussion linked above (#8235 and this comment) you could get more context 😃

Proposed approach

Selection of the alternative

From what I've seen, in YOLOv5, the output of model.predict(...) was an object of type Detections, while in YOLOv8 it's a simple PyTorch Tensor (torch.Tensor). Correct me if I'm wrong.

In the end I was wrong indeed. The output in YOLOv8 is actually a custom object (like YOLOv5's Detections) called Results, so I would opt for the Alternative 2.

Since the output is actually a custom object (Results), I would add methods there.

Current situation:

What I would do

I would create a method Results.pandas, which elaborates the data into boxes and returns a pandas Dataframe where:

Once we have this dataframe, we could use pandas' to_json() method to get the results in json format, directly returnable by out API 😄

Check the above code snipped in this message (What this feature request aims for) for a contextualized example.

This way, we can have the output in a dataframe format, which might be useful for various use-cases, and if we need the outoput in json (e.g., in the super common use-case where the model is deployed and accessed through an API), we can pass by the .pandas() method and then use pandas' to_json() method without the need to implement something more.

Or something like that... @glenn-jocher let me know what you think 😄

glenn-jocher commented 4 months ago

Hi @AlexPasqua-G,

Thank you for your input! The feature request aims to streamline the process of converting detection results into a JSON format, which is particularly useful when deploying models via APIs, such as with FastAPI.

Current Workflow

Currently, extracting and formatting the detection results involves several manual steps, as shown in the provided example. This process can be cumbersome, especially when frequently deploying models in production environments.

Proposed Enhancement

The goal is to simplify this workflow by introducing a method that directly converts the detection results into a pandas DataFrame, which can then be easily converted to JSON. This would reduce the amount of boilerplate code and make the deployment process more efficient.

Implementation Plan

Given that the output in YOLOv8 is a custom Results object, we can add a method to this class to facilitate the conversion. Here's a concise plan:

  1. Add a pandas Method to Results Class:

    • This method will convert the detection results into a pandas DataFrame.
    • It will accept an argument to specify the format of the bounding box coordinates (e.g., xyxy, xywh, etc.).
  2. Usage Example:

    import cv2
    import numpy as np
    from ultralytics import YOLO
    from fastapi import FastAPI, File, UploadFile
    
    app = FastAPI()
    model = YOLO('yolov8n.pt')  # Load model at startup
    
    @app.post("/detect")
    async def detect(file: UploadFile):
       # Process the uploaded image for object detection
       image_bytes = await file.read()
       image = np.frombuffer(image_bytes, dtype=np.uint8)
       image = cv2.imdecode(image, cv2.IMREAD_COLOR)
    
       # Perform object detection with YOLOv8
       results = model.predict(image)
    
       # Extract bounding box data as a pandas DataFrame and convert to JSON
       json_output = results.pandas('xyxy').to_json()
    
       return json_output

Benefits

This enhancement will make it more convenient for users to deploy YOLOv8 models in real-world applications, particularly those involving APIs. If you have any further suggestions or feedback, please let us know! 😊

AlexPasqua commented 4 months ago

Alright @glenn-jocher, then I'll proceed to open a PR about it 🚀

glenn-jocher commented 4 months ago

Hi @AlexPasqua,

That sounds fantastic! 🚀 We're excited to see your contribution. When you're ready, please go ahead and submit the PR. If you need any assistance or have further questions during the process, feel free to reach out here. Your efforts to enhance the usability of YOLOv8 are greatly appreciated by the community and the Ultralytics team. Thank you! 😊

Best of luck with the PR, and we're looking forward to reviewing it!

AlexPasqua commented 3 months ago

@glenn-jocher Actually the Results.summary() method does something similar: it returns a list of dictionaries, with a dict for each detected object, e.g., [{name='person', class=0, confidence=0.92, box={x1': 207.03125, 'y1': 55.68618, 'x2': 243.46902, 'y2': 153.60498}}].

This makes possible to do something like:

import cv2
import numpy as np
from ultralytics import YOLO
from fastapi import FastAPI, File, UploadFile

app = FastAPI()
model = YOLO('yolov8n.pt')  # Load model at startup

@app.post("/detect")
async def detect(file: UploadFile):
    # Process the uploaded image for object detection
    image_bytes = await file.read()
    image = np.frombuffer(image_bytes, dtype=np.uint8)
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)

    # Perform object detection with YOLOv8
    results = model.predict(image)

    # Extract bounding box data
    summary_output = results.summary()

Then if you want a json output you can still do return pd.Dataframe(summary_output).to_json().

The only difference with YOLOv5's pandas method is that we don't have flexibility on choosing whether we want the coordinates in (x1, y1, x2, y2) format or (x1, y1, w, h) and so on. Maybe, instead of implementing this pandas method in YOLOv8, I should modify the Results.summary() method so that we can pass to it the desired coords format. I created a PR for this, could you give me an initial opinion or review? (#14946)

glenn-jocher commented 3 months ago

@AlexPasqua thank you for pointing out the Results.summary() method and its capabilities. You're correct that it provides a similar functionality by returning a list of dictionaries for each detected object. Modifying the Results.summary() method to accept a parameter for the desired coordinate format is a practical approach. This would indeed align more closely with the flexibility offered by YOLOv5's pandas method.

I appreciate your initiative in creating a PR for this enhancement. I'll review your PR (#14946) and provide feedback shortly. This improvement will certainly enhance the usability of the Results class for various deployment scenarios. Thank you for your contribution!

github-actions[bot] commented 1 week ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

AlexPasqua commented 1 week ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

* **Docs**: https://docs.ultralytics.com

* **HUB**: https://hub.ultralytics.com

* **Community**: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

There's a PR open (#14946) to close this issue, but it's waiting for a review