ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.66k stars 16.33k forks source link

How to use Yolo 5's segment in real-time inference? #12222

Closed Zhong-Zi-Zeng closed 11 months ago

Zhong-Zi-Zeng commented 1 year ago

Search before asking

Question

Because Yolov5 didn't support the real-time inference of segment, so I did some tests on this model. Like below:

model = torch.hub.load('ultralytics/yolov5', 'custom', path='/weight/yolov5m-seg.pt')
img = np.ascontiguousarray(np.transpose(img, [2, 0, 1]))[None, ...]
a, b, c = model(torch.tensor(img, dtype=torch.float32).to('cuda'))

I fed into a size 1024 x 1024 image and then got this result: image

Could you tell me what this is? And how to use it. Thanks!

Additional

No response

github-actions[bot] commented 1 year ago

👋 Hello @Zhong-Zi-Zeng, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 year ago

@Zhong-Zi-Zeng hi there! 👋

To perform real-time inference with YOLOv5's segmentation model, you can follow these steps:

  1. Load the segmentation model using the torch.hub.load function by specifying the ultralytics/yolov5 repository and the custom model. You can also provide the path to the yolov5m-seg.pt weights file.
  2. Preprocess your input image by converting it to a numpy array and ensuring the correct dimension order (HWC).
  3. Convert the image to a torch tensor and move it to the GPU if available.
  4. Call the model on the input tensor and obtain the outputs, which will be assigned to variables a, b, and c.

Regarding the result you obtained, it appears to be an image, but I'm unable to view it because the link is not accessible. Could you please provide more details about the output image, or you can adjust your question accordingly?

Let me know if there's anything else I can assist you with!

Zhong-Zi-Zeng commented 1 year ago

According to the output result, I get a list of three elements. The first element shape is (1, 64521, 117 ). The second is (1, 32, 256, 256), and the third also is a list that includes three elements. The first element shape is (1, 3, 128, 128, 117 ). The second is (1, 3, 64, 64, 117 ). The last is (1, 3, 32, 32, 117 ).

I want to know what the mean of the elements respectively. Thanks!

glenn-jocher commented 1 year ago

@Zhong-Zi-Zeng the output shape you obtained after running the segmentation model is as follows:

  1. The shape of the first element is (1, 64521, 117).
  2. The shape of the second element is (1, 32, 256, 256).
  3. The shape of the third element is a list with three sub-elements. The shape of the first sub-element is (1, 3, 128, 128, 117), the shape of the second sub-element is (1, 3, 64, 64, 117), and the shape of the third sub-element is (1, 3, 32, 32, 117).

These elements correspond to different parts of the model's output. The interpretation of each element depends on the specific architecture and design choices made in the YOLOv5 segmentation model.

However, it's important to note that the detailed meaning and purpose of these elements may be best explained by the YOLOv5 community or the developers of the model. I recommend reaching out to them for a more comprehensive understanding.

If you have any further questions or need additional assistance, please feel free to ask.

Zhong-Zi-Zeng commented 1 year ago

Thanks for your answer. However, I want to know how to directly get the mask of the input image from the above output. Could you give me some advices? I will extremely appreciate it!

glenn-jocher commented 1 year ago

@Zhong-Zi-Zeng sure! To directly obtain the mask of the input image from the output of the YOLOv5 segmentation model, you can follow these steps:

  1. Access the first element of your output list, which has a shape of (1, 64521, 117).
  2. Apply a softmax operation on the last dimension of this element to obtain a probability distribution for each pixel across the 117 classes.
  3. Extract the class probabilities for the specific class you are interested in (e.g., the class representing the mask).
  4. Apply a threshold to the class probabilities to obtain a binary mask (e.g., values above a certain threshold are set to 1, while values below the threshold are set to 0).
  5. Resize the resulting binary mask to match the original input image size if necessary.

These steps will help you obtain the mask of the input image from the YOLOv5 segmentation model's output.

If you need further assistance or have any more questions, please feel free to ask.

sc-deeraj commented 1 year ago

Is there any script or detailed documentation to the above steps? @glenn-jocher

Zhong-Zi-Zeng commented 1 year ago

Is there any script or detailed documentation to the above steps? @glenn-jocher

I recommend you can use yolo-v8, that supports directly output masks during inference.

sc-deeraj commented 1 year ago

Is there any way to use YOLOV5 model with Yolov8 repo(code)? @Zhong-Zi-Zeng

glenn-jocher commented 1 year ago

@Zhong-Zi-Zeng yes, it is possible to use the YOLOv5 model in conjunction with the YOLOv8 repository code.

Since both YOLOv5 and YOLOv8 share the same underlying architecture, you can adapt the YOLOv8 code to work with the YOLOv5 model.

To do this, you would need to make sure that the input and output formats of the YOLOv5 model align with the expected input and output formats in the YOLOv8 codebase. This may involve modifying the code to handle differences in model architecture, layer connectivity, and output interpretation.

Keep in mind that modifying code from different repositories can be complex and may require a deep understanding of both models. It is recommended to carefully review the code and consult the documentation or community for both YOLOv5 and YOLOv8 if you decide to proceed with this approach.

Please let me know if there's anything else I can assist you with.

github-actions[bot] commented 11 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

fewshotstudy commented 7 months ago

after running segment/predition, i want ti delete the bbox on segment region. what shoul i do?

glenn-jocher commented 7 months ago

@fewshotstudy hey there! 😊 If you want to remove the bounding box (bbox) from the segmented region after prediction, one simple approach could be to directly manipulate the final image output. Assuming you're working with the output mask and the original image, here's a quick idea in Python using OpenCV:

import cv2
import numpy as np

# Assuming 'mask' is your segmentation mask where objects are marked by 1s
# And 'original_img' is your original image

# Convert mask to a binary form where 1s become 255 (white) - for visibility
mask_binary = (mask * 255).astype(np.uint8)

# Find contours in the mask
contours, _ = cv2.findContours(mask_binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

# Draw contours on the original image - here (0, 0, 255) is color Red and 2 is thickness
for cnt in contours:
    cv2.drawContours(original_img, [cnt], 0, (0, 0, 255), 2)

# To remove bboxes completely, simply skip the drawing step or manipulate as per your need

# Display or save your result
cv2.imshow('Segment without Bbox', original_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

This example finds contours on the segmentation mask and then draws/redraws them on the original image. If your goal is purely to remove bounding boxes and keep the segmentation, you can adjust, omit the drawing function, or modify it to suit what you specifically need (e.g., fill in the segmented region instead of outlining).

Hope this helps you tackle your issue. Let me know if you have any more questions! 🚀