Add new task - image depth

Burhan-Q commented 11 months ago

Summary

Add new Ultralytics YOLO task for monocular image depth.

Dataset options:

DIODE: A Dense Indoor and Outdoor Depth Dataset - DIODE-80GB
NYUv2 - NYUv2-3GB - same as DINOv2

Tasks

[ ] dataset infrastructure
[ ] model structure updates/changes
[ ] training tests
[ ] assess model performance
[ ] write unit tests
[ ] update Ultralytics Documentation

Burhan-Q commented 9 months ago

Target for ultralytics 8.3.0

MoAbbasid commented 7 months ago

Would be great to have a depth estimation task, helps with alot of applications

glenn-jocher commented 7 months ago

@MoAbbasid absolutely, depth estimation is indeed a valuable feature for a myriad of applications, from autonomous driving to AR/VR. 😊 We are always exploring new ways to expand the capabilities of Ultralytics YOLO models. While we don't currently have built-in support for depth estimation tasks, your suggestion is very much appreciated and it's something we can consider for future updates.

In the meantime, for those interested in experimenting, combining YOLO object detections with separate depth estimation models could be one way to integrate depth information. For example:

from ultralytics import YOLO
# Load YOLO model for object detection
model = YOLO('yolov8n.pt')

# Predict objects in an image
results = model('image.jpg')

# Use another model here for depth estimation, e.g., MIDAS
# depth_map = midas_model.predict('image.jpg')

# Combine YOLO predictions with depth map as needed
# ...

Stay tuned for updates, and thanks for your feedback! 🚀

MoAbbasid commented 7 months ago

@glenn-jocher interesting, any idea/examples of how I can combine the YOLO prediction with the depth map? for issues like absolute distance measurement?

glenn-jocher commented 7 months ago

@MoAbbasid hi there! 👋 Combining YOLO predictions with depth maps for absolute distance measurement is an intriguing idea. Here's a quick way to integrate YOLO object detections with a depth map:

from ultralytics import YOLO
import cv2

# Load your models (YOLO for detection, Depth Model for depth estimation)
yolo_model = YOLO('yolov8n.pt')
# Assume 'depth_model' is your loaded depth estimation model

# Load your image
image = cv2.imread('path/to/your/image.jpg')

# Get YOLO predictions
yolo_results = yolo_model(image)

# Assume you have your depth map (e.g., from your depth estimation model)
depth_map = depth_model.predict(image)

# Example: Overlay depth information on YOLO object detection bounding boxes
for box in yolo_results.boxes:
    x1, y1, x2, y2 = map(int, box.xyxy)
    # Grab depth information from the center of the bounding box
    depth_value = depth_map[(y2 + y1) // 2, (x2 + x1) // 2]
    print(f"Detected object at depth: {depth_value}")

# This simplistic approach takes the central point of the bounding box to query the depth map.

Keep in mind, this sample assumes the depth map's scale matches your image and that you've properly loaded/initialized both your YOLO model and your depth estimation model. Tweaks might be necessary based on the actual size and scale of your depth map compared to the original image. Hope this helps! 🌟

Cogwiswai commented 4 months ago

How do i integrate depth anything v2 with yolov10 for training with a dataset to increase detection

glenn-jocher commented 4 months ago

Hello @Cogwiswai,

Integrating depth information with YOLOv10 for enhanced detection is a great idea! Here's a concise guide to get you started:

Depth Estimation Model: First, ensure you have a depth estimation model, such as MiDaS, to generate depth maps from your dataset images.
Dataset Preparation: For each image in your dataset, generate corresponding depth maps using your depth estimation model. Save these depth maps alongside your original images.
Custom DataLoader: Modify your DataLoader to load both the RGB images and their corresponding depth maps. You can concatenate the depth map as an additional channel to the RGB image, making it a 4-channel input.
Model Modification: Adjust the YOLOv10 model to accept 4-channel inputs instead of the standard 3-channel RGB inputs. This might involve modifying the first convolutional layer to accommodate the additional depth channel.

Here's a simplified example to illustrate the concept:

from ultralytics import YOLO
import cv2
import numpy as np

# Load YOLOv10 model
model = YOLO('yolov10.yaml')

# Custom DataLoader function
def load_data(image_path, depth_path):
    image = cv2.imread(image_path)
    depth = cv2.imread(depth_path, cv2.IMREAD_GRAYSCALE)
    depth = np.expand_dims(depth, axis=2)  # Add channel dimension
    combined = np.concatenate((image, depth), axis=2)  # Combine RGB and depth
    return combined

# Example usage
image_path = 'path/to/image.jpg'
depth_path = 'path/to/depth.png'
data = load_data(image_path, depth_path)

# Train the model
model.train(data=data, epochs=100)

Training: Train the modified YOLOv10 model with your new dataset that includes depth information.

For more detailed guidance, you can refer to our minimum reproducible example to ensure your setup is correct.

If you encounter any issues, please provide a reproducible example, and ensure you are using the latest versions of the packages. This will help us assist you more effectively.

Happy coding! 🚀

Cogwiswai commented 4 months ago

@glenn-jocher Can you give a example code to modify the first convolutional layer to accommodate the additional depth channel.

glenn-jocher commented 4 months ago

Hi @Cogwiswai,

Certainly! Modifying the first convolutional layer to accommodate an additional depth channel involves adjusting the input channels of the first layer from 3 (RGB) to 4 (RGB + Depth). Here's a concise example to guide you through this process:

import torch
import torch.nn as nn
from ultralytics import YOLO

# Load the YOLOv10 model
model = YOLO('yolov10.yaml')

# Modify the first convolutional layer
class ModifiedYOLO(nn.Module):
    def __init__(self, model):
        super(ModifiedYOLO, self).__init__()
        self.model = model
        # Assuming the first layer is a Conv2d layer
        original_conv = self.model.model[0]
        self.model.model[0] = nn.Conv2d(4, original_conv.out_channels, kernel_size=original_conv.kernel_size, stride=original_conv.stride, padding=original_conv.padding)

    def forward(self, x):
        return self.model(x)

# Instantiate the modified model
modified_model = ModifiedYOLO(model)

# Verify the change
print(modified_model.model[0])

This code snippet demonstrates how to modify the first convolutional layer to accept 4-channel inputs. Ensure you adapt this to your specific model architecture if necessary.

If you encounter any issues, please provide a reproducible example as outlined in our minimum reproducible example guide. Also, make sure you are using the latest versions of the packages to verify that the issue persists.

Happy coding! 🚀

vivek-ketha commented 4 months ago

@glenn-jocher i am facing this issue

TypeError: 'DetectionModel' object is not subscriptable

glenn-jocher commented 4 months ago

Hi @vivek-ketha,

Thank you for bringing this to our attention! The TypeError: 'DetectionModel' object is not subscriptable typically indicates that there is an attempt to use indexing on an object that does not support it.

To help us diagnose the issue more effectively, could you please provide a minimum reproducible example? This will allow us to understand the context and pinpoint the problem accurately. You can refer to our guide on creating a reproducible example here: Minimum Reproducible Example.

Additionally, please ensure that you are using the latest versions of the Ultralytics packages. Sometimes, issues are resolved in newer releases, and updating might fix the problem.

Here's a quick checklist to help you:

Provide a reproducible example: Share the code snippet that triggers the error.
Verify package versions: Ensure you are using the latest versions of the Ultralytics packages.

If you have already verified these steps and the issue persists, please share the details, and we will be happy to assist you further.

Looking forward to your response! 😊

Cogwiswai commented 4 months ago

@glenn-jocher i i am adding rgb images with depth . How do i integrate it with the annotations part can you share a example code

glenn-jocher commented 4 months ago

Hi @Cogwiswai,

Thank you for your question! Integrating RGB images with depth information and ensuring that the annotations are correctly handled is a great way to enhance your model's capabilities. Here's a concise guide to help you achieve this:

Step-by-Step Integration

Dataset Preparation: Ensure that each RGB image has a corresponding depth map. Store these pairs in a structured format.
Custom DataLoader: Modify your DataLoader to load both RGB images and their corresponding depth maps. Concatenate the depth map as an additional channel to the RGB image, making it a 4-channel input.
Annotation Handling: Ensure that your annotations (bounding boxes, labels, etc.) are correctly aligned with the combined RGB-Depth images.

Example Code

Here's an example of how you can modify your DataLoader and integrate depth information with annotations:

import cv2
import numpy as np
from ultralytics import YOLO

# Load YOLOv10 model
model = YOLO('yolov10.yaml')

# Custom DataLoader function
def load_data(image_path, depth_path, annotation_path):
    image = cv2.imread(image_path)
    depth = cv2.imread(depth_path, cv2.IMREAD_GRAYSCALE)
    depth = np.expand_dims(depth, axis=2)  # Add channel dimension
    combined = np.concatenate((image, depth), axis=2)  # Combine RGB and depth

    # Load annotations (assuming COCO format for example)
    with open(annotation_path, 'r') as f:
        annotations = json.load(f)

    return combined, annotations

# Example usage
image_path = 'path/to/image.jpg'
depth_path = 'path/to/depth.png'
annotation_path = 'path/to/annotation.json'
data, annotations = load_data(image_path, depth_path, annotation_path)

# Train the model
model.train(data=data, annotations=annotations, epochs=100)

Important Notes

Annotations: Ensure your annotation format is compatible with your training pipeline.
Reproducible Example: If you encounter issues, please provide a reproducible example as outlined in our Minimum Reproducible Example Guide.
Latest Versions: Verify that you are using the latest versions of the Ultralytics packages.

Feel free to reach out if you have any further questions or run into any issues. Happy coding! 😊

MoAbbasid commented 4 months ago

@glenn-jocher, Hi I have a yolo model trained on my custom dataset, it was a normal bbox object detection, I want to retrain for a segmentation task, do you think It will yield worse results? is it better to start with a yolo_seg model? then retrain from the beginning?

glenn-jocher commented 4 months ago

Hi @MoAbbasid,

Great question! Transitioning from a bounding box object detection model to a segmentation task can indeed be a bit tricky. Here are a few insights to help you decide the best approach:

Starting with a YOLO-Seg Model: Generally, it's advisable to start with a segmentation-specific model like yolov8n-seg.pt. These models are pretrained on segmentation tasks and have the necessary architecture to handle segmentation masks effectively. This can give you a better starting point and potentially yield better results.
Retraining Your Current Model: While you can retrain your current detection model for segmentation, it might not be as effective because the model architecture and weights are optimized for bounding box detection, not segmentation. This could lead to suboptimal performance.
Training from Scratch: If you have the resources and time, training a segmentation model from scratch on your dataset can also be a good approach. This ensures that the model learns the specific features and nuances of your data.

Example Code to Start Training with a YOLO-Seg Model

Here's a quick example to get you started with a YOLO-Seg model:

from ultralytics import YOLO

# Load a pretrained YOLO-Seg model
model = YOLO('yolov8n-seg.pt')

# Train the model on your custom segmentation dataset
results = model.train(data='path/to/your/segmentation_dataset.yaml', epochs=100, imgsz=640)

Additional Tips

Dataset Preparation: Ensure your dataset is properly formatted for segmentation tasks. You can refer to the Dataset Guide for more details.
Reproducible Example: If you encounter any issues, providing a reproducible example can greatly help in diagnosing the problem. You can find more information on creating one here.

Feel free to reach out if you have any more questions. Happy training! 😊

plo97 commented 3 months ago

from ultralytics import YOLO import torch import torch.nn as nn

if name == 'main':

model = YOLO("ultralytics/cfg/models/v8/yolov8m-seg.yaml")

# Modify the first convolutional layer
class ModifiedYOLO(nn.Module):
    def __init__(self, model):
        super(ModifiedYOLO, self).__init__()
        self.model = model
        # Assuming the first layer is a Conv2d layer
        original_conv = self.model.model[0]
        self.model.model[0] = nn.Conv2d(6, original_conv.out_channels, kernel_size=original_conv.kernel_size,
                                        stride=original_conv.stride, padding=original_conv.padding)
    def forward(self, x):
        return self.model(x)

# Instantiate the modified model
modified_model = ModifiedYOLO(model)
print(modified_model.model[0])

results = model.train(data=r"./ultralytics/cfg/datasets/firedata.yaml", imgsz=128, epochs=10, device='0', batch=8,seed=42)

Traceback (most recent call last): File "/home/test/Liupu/DLModel/SomeModel/YOLOEVM/ultralytics-main/train-seg.py", line 23, in modified_model = ModifiedYOLO(model) File "/home/test/Liupu/DLModel/SomeModel/YOLOEVM/ultralytics-main/train-seg.py", line 17, in init original_conv = self.model.model[0] TypeError: 'SegmentationModel' object is not subscriptable

Process finished with exit code 1

pderrenger commented 3 months ago

Hi @plo97,

Thanks for sharing your code! The error you're encountering, TypeError: 'SegmentationModel' object is not subscriptable, suggests that the SegmentationModel class does not support direct indexing.

To modify the first convolutional layer, you need to access the model's layers correctly. Here's a revised approach:

from ultralytics import YOLO
import torch
import torch.nn as nn

if __name__ == '__main__':
    model = YOLO("ultralytics/cfg/models/v8/yolov8m-seg.yaml")

    # Modify the first convolutional layer
    class ModifiedYOLO(nn.Module):
        def __init__(self, model):
            super(ModifiedYOLO, self).__init__()
            self.model = model
            # Access the first layer correctly
            original_conv = self.model.model.model[0]
            self.model.model.model[0] = nn.Conv2d(6, original_conv.out_channels, kernel_size=original_conv.kernel_size,
                                                  stride=original_conv.stride, padding=original_conv.padding)
        def forward(self, x):
            return self.model(x)

    # Instantiate the modified model
    modified_model = ModifiedYOLO(model)
    print(modified_model.model.model[0])

    results = modified_model.train(data=r"./ultralytics/cfg/datasets/firedata.yaml", imgsz=128, epochs=10, device='0', batch=8, seed=42)

This should resolve the issue by correctly accessing the model's layers. If you continue to face problems, please ensure you're using the latest version of the Ultralytics package. Let me know if you need further assistance!

Chigo55 commented 3 months ago

@glenn-jocher I want to estimate the size of the detected object through the keypoint("kp") model. In order to accurately estimate the size of an object, I want to obtain 3D coordinates including depth information in the 2D coordinates of kp. I wonder if there is an example for this

Chigo55 commented 3 months ago

And I am using the current Depth Anything v2 model to generate a depth map and then estimate the depth by obtaining a depth value corresponding to the coordinates of the keypoint with the yolo-keypoint model. However, this method cannot secure real-time performance in the estimation speed because the depth map and object detection are separate. I want to solve this.

pderrenger commented 3 months ago

To achieve real-time performance in estimating object size using depth information and keypoints, integrating depth estimation directly into the YOLO keypoint model would be more efficient. This approach avoids the overhead of running separate models for depth and object detection. Consider exploring models that combine depth estimation and keypoint detection in a single architecture, or look into optimizing your current pipeline to reduce latency. Ensure you are using the latest versions of the Ultralytics packages for any performance improvements.

Bhavay-2001 commented 3 months ago

Hi @glenn-jocher @pderrenger, is this issue available for community or only open to ultralytics members? I would like to give it a try? Thanks

pderrenger commented 3 months ago

Hi @Bhavay-2001,

Thank you for your interest! This issue is open to the community, and we welcome your contributions. Please ensure you are using the latest version of the Ultralytics package for reproducibility. Feel free to submit a pull request when you're ready. Thanks!

Bhavay-2001 commented 3 months ago

Hi @pderrenger @glenn-jocher, soo how do I start with the contributions? What examples do we need to add for depth estimation?

pderrenger commented 3 months ago

Hi @Bhavay-2001,

Thank you for your interest in contributing! To get started, please fork the Ultralytics repository and clone your fork locally. For depth estimation, you can explore integrating depth estimation directly into the YOLO keypoint model or optimizing the current pipeline for better performance. Detailed contribution guidelines can be found in our Contributing Guide. We look forward to your contributions!

Bhavay-2001 commented 3 months ago

Hi @pderrenger, should we add a new task for depth estimation or look for how we can integrate it with the pose estimation task?

pderrenger commented 3 months ago

Hi @Bhavay-2001,

Great question! Integrating depth estimation with the pose estimation task could be a more efficient approach, as it would streamline the process and potentially improve real-time performance. This integration would allow both tasks to benefit from shared computations and reduce the overhead of running separate models. If you decide to proceed, please ensure your implementation is compatible with the latest versions of the Ultralytics package. Looking forward to your contributions!

Bhavay-2001 commented 3 months ago

Hi @paulguerrie, I am not able to understand the task that we have to perform here. Could you please provide a more detailed approach on how should I proceed with this.

For now, I have only understood some basics. Let's say I want to add depth information to DetectionPredictor, I can add depth information to this code:

for i, pred in enumerate(preds):
            orig_img = orig_imgs[i]
            pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape)
            img_path = self.batch[0][i]
            results.append(Results(orig_img, path=img_path, names=self.model.names, boxes=pred))
        return results

Soo, could you please guide a little detailed approach. Thanks

Bhavay-2001 commented 3 months ago

Hi @pderrenger, any suggestion on this?

pderrenger commented 3 months ago

Hi @Bhavay-2001,

To integrate depth information with the DetectionPredictor, you'll need to modify the prediction loop to include depth estimation. This involves updating the model to handle depth data and ensuring the results incorporate both detection and depth information. Start by exploring how depth data can be processed alongside detection outputs. If you encounter specific issues, please check if they persist with the latest Ultralytics package versions. Feel free to ask for further clarification on any specific part of the implementation.

monajalal commented 1 month ago

x1, y1, x2, y2 = map(int, box.xyxy)
    # Grab depth information from the center of the bounding box
    depth_value = depth_map[(y2 + y1) // 2, (x2 + x1) // 2]
    print(f"Detected object at depth: {depth_value}")

@glenn-jocher

I got this error:

    x1, y1, x2, y2 = map(int, box.xyxy)
ValueError: only one element tensors can be converted to Python scalars

        result_img, results = predict_and_detect(model, color_image, classes=[0, 7], conf=0.5)

        for result in results:
            for box in result.boxes:
                x1, y1, x2, y2 = map(int, box.xyxy)
                # Grab depth information from the center of the bounding box
                depth_value = depth_image[(y2 + y1) // 2, (x2 + x1) // 2]
                print(f"Detected object at depth: {depth_value}")

I replaced it with

        result_img, results = predict_and_detect(model, color_image, classes=[0, 7], conf=0.5)

        for result in results:
            for box in result.boxes:
                x1 = int(box.xyxy[0][0])
                y1 = int(box.xyxy[0][1])
                x2 = int(box.xyxy[0][2])
                y2 = int(box.xyxy[0][3])
                # x1, y1, x2, y2 = map(int, box.xyxy)
                # Grab depth information from the center of the bounding box
                depth_value = depth_image[(y2 + y1) // 2, (x2 + x1) // 2]
                print(f"Detected object at depth: {depth_value}")

But I don't think if it's correct.

    depth_value = depth_image[(y2 + y1) // 2, (x2 + x1) // 2]
IndexError: index 627 is out of bounds for axis 0 with size 480

glenn-jocher commented 1 month ago

It seems you're encountering issues with tensor indexing. Ensure that box.xyxy returns a tensor with the correct shape. You might need to adjust the indexing to access the coordinates properly. Additionally, verify that the depth image dimensions match the expected size to avoid IndexError. If the depth image size is smaller, you may need to scale the coordinates accordingly.

glenn-jocher commented 1 month ago

Hi, thanks for your interest in contributing! You can start by checking the open issues and discussions related to depth estimation on our GitHub. Feel free to propose enhancements or submit a pull request with your ideas.

monajalal commented 1 month ago

    x1, y1, x2, y2 = map(int, box.xyxy)

So after I matched the resolution of depth and color, and ran the following, it doesn't work consistently. Shouldn't you expect that far away items have a larger depth value?

        for result in results:
            for box in result.boxes:
                x1 = int(box.xyxy[0][0])
                y1 = int(box.xyxy[0][1])
                x2 = int(box.xyxy[0][2])
                y2 = int(box.xyxy[0][3])
                # x1, y1, x2, y2 = map(int, box.xyxy)
                # Grab depth information from the center of the bounding box
                depth_value = depth_image[(y2 + y1) // 2, (x2 + x1) // 2]
                print(f"Detected object at depth: {depth_value}")

Also, are the results sorted based on the confidence threshold of the detection bbox?

glenn-jocher commented 1 month ago

Thank you for your interest in contributing! You can start by reviewing the existing codebase and documentation to understand the current implementation. For depth estimation, consider adding examples that integrate depth maps with YOLO models, focusing on efficient processing for real-time applications. Feel free to submit a pull request with your enhancements.

fcakyon commented 2 weeks ago

it seems the deadline has passed, should we update it or remove this from the roadmap? @glenn-jocher @Laughing-q

ultralytics / ultralytics