ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
26.17k stars 5.22k forks source link

how to crop the result with obb model #9344

Closed NguyenDucQuan12 closed 4 days ago

NguyenDucQuan12 commented 3 months ago

Search before asking

Question

how can i crop the detected image using obb model, currently i am using yolo standard version, i can crop the predicted image using the following statement: ''' results= license_plate_detect(image)[0] if results:

Trích xuất vị trí bounding box

      boxes = results.boxes.xyxy.tolist()

      for i, box in enumerate(boxes):
          # lấy tọa độ (x1,y1) trên cùng bên trái và (x2,y2) cuối cùng bên phải
          x1, y1, x2, y2 = box
          # Cắt khu vực chứa biển số để đưa vào paddleocr
          license_plate_crop = image[int(y1):int(y2), int(x1):int(x2)]

''' Now I want to switch to obb model, can you guide me how to get the result with cropped image. Thank you

Additional

No response

glenn-jocher commented 3 months ago

@NguyenDucQuan12 hello! 😊 Switching to an OBB (oriented bounding box) model means you'll be working with rotated bounding boxes. The .boxes attribute for an OBB model will contain the center coordinates, width, height, and angle in radians.

Here's how you can adjust your code to crop images based on OBB outputs:

results = license_plate_detect(image)[0]
if results:
    obbs = results.boxes.xywhr.tolist()  # Get OBBs in [x_center, y_center, width, height, angle] format

    for i, obb in enumerate(obbs):
        xc, yc, w, h, angle = obb  # Unpack the OBB

        # You'll need to perform additional steps to rotate and crop
        # This is a simplified example, assuming `image` is your input image numpy array
        center = (int(xc), int(yc))
        M = cv2.getRotationMatrix2D(center, angle, 1.0)  # Get rotation matrix for the given angle

        # Apply affine transformation - rotating the image
        rotated = cv2.warpAffine(image, M, image.shape[1::-1], flags=cv2.INTER_LINEAR)

        # Cropping the rotated image around the center point
        x1 = max(int(xc - w / 2), 0)  # Ensuring the crop coordinates are within image bounds
        y1 = max(int(yc - h / 2), 0)
        x2 = min(int(xc + w / 2), image.shape[1])
        y2 = min(int(yc + h / 2), image.shape[0])
        license_plate_crop = rotated[y1:y2, x1:x2]

This example demonstrates the basic flow for rotating the whole image based on each OBB's angle and then cropping. Keep in mind, more complex scenarios might require additional processing for accuracy. Hope this helps! 🚗

NguyenDucQuan12 commented 3 months ago

@glenn-jocher i get image from camera through rtsp stream, and process each frame, when I detect the license plate and try to get xywhr value I get error: obbs = results.boxes.xywhr.tolist() # Get OBBs in [x_center, y_center, width, height, angle] format ^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'xywhr'

NguyenDucQuan12 commented 3 months ago

I trained the model using gg colab and got 2 pt files, I used it but it gives error. image

NguyenDucQuan12 commented 3 months ago

best-yolo-obb.zip roboflow_dataset 2019_12_08_07_46_33_AM423138724 Here are 2 model files and images I tested

glenn-jocher commented 3 months ago

@NguyenDucQuan12 hello there! 🌞 It seems like the model files and images you've shared are related to an issue you're encountering. Could you please provide more details about the error messages or unexpected behaviors you're experiencing?

For example, if you're dealing with an AttributeError related to xywhr like mentioned in your previous message, please ensure that you're using a model specifically trained for OBB (oriented bounding box) prediction. If the model isn't correctly set up for OBB, this attribute won't be available.

Also, when loading and using your model, make sure it's done correctly:

from ultralytics import YOLO

# Load your custom model
model = YOLO('path/to/your/model.pt')

# Process an image
results = model('path/to/your/image.jpg')

# Now, you can access OBB attributes if your model supports it
if results.boxes and hasattr(results.boxes, 'xywhr'):
    obbs = results.boxes.xywhr.tolist()
else:
    print("This model does not support OBB predictions.")

Regarding the image and files you shared, ensuring they're accessible and properly linked in your script is crucial. For the images hosted on GitHub, confirm that the URLs are correct and publicly accessible.

I'd love to help you sort this out, so any additional context on the issue would be greatly appreciated! 🛠️

kamilalfian commented 2 months ago

@NguyenDucQuan12 did you solve the boxes=None issue? I had the same problem also with obb model inference. My model can detect object just fine but the results[0] look like this which doesnt make sense

masks: None names: {0: 'Agama', 1: 'Alamat', 2: 'Berlaku', 3: 'Face', 4: 'Foto', 5: 'Goldar', 6: 'JK', 7: 'Kabupaten', 8: 'Kec', 9: 'Kel-Desa', 10: 'NAMA', 11: 'NIK', 12: 'Pekerjaan', 13: 'Provinsi', 14: 'RT-RW', 15: 'Signature', 16: 'Status', 17: 'TTL', 18: 'Warga-Negara'} obb: ultralytics.engine.results.OBB object orig_img: array([[[215, 215, 215], [215, 215, 215], [215, 215, 215], ..., [225, 225, 225], [226, 226, 226], [226, 226, 226]],

   [[216, 216, 216],
    [216, 216, 216],
    [216, 216, 216],
    ...,
    [223, 223, 223],
    [224, 224, 224],
    [224, 224, 224]],

   [[217, 217, 217],
    [217, 217, 217],
    [217, 217, 217],
    ...,
    [221, 221, 221],
    [221, 221, 221],
    [221, 221, 221]],

   ...,

   [[107, 107, 107],
    [106, 106, 106],
    [103, 103, 103],
    ...,
    [ 88,  88,  88],
    [ 89,  89,  89],
    [ 86,  86,  86]],

   [[115, 115, 115],
    [113, 113, 113],
    [107, 107, 107],
    ...,
    [ 87,  87,  87],
    [ 90,  90,  90],
    [ 89,  89,  89]],

   [[120, 120, 120],
    [116, 116, 116],
    [109, 109, 109],
    ...,
    [ 88,  88,  88],
    [ 94,  94,  94],
    [ 94,  94,  94]]], dtype=uint8)

orig_shape: (1600, 1200) path: '/content/grayscale_image3.jpeg' probs: None save_dir: 'runs/obb/predict' speed: {'preprocess': 7.385492324829102, 'inference': 3030.5862426757812, 'postprocess': 15.470743179321289}

why are the probs and boxes none while my model can actually detect object?

cabral0413 commented 2 months ago

@kamilalfian Yeah I got the same issue😥😥

samthakur587 commented 2 months ago
from ultralytics import YOLO
import cv2
import numpy as np

# Load a model
model = YOLO('runs/obb/train/weights/best.pt')  # pretrained YOLOv8n model

# Run batched inference on a list of images
results = model.predict('test/image')  # return a list of Results objects
image = cv2.imread('test/image')

# Process results list
# Process results list
for result in results:
    obb = result.obb  # Oriented boxes object for OBB outputs
    point = result.obb.xywhr.tolist()

    for i, ob in enumerate(point):
        xc, yc, w, h, angle = ob  # Unpack the OBB

        # You'll need to perform additional steps to rotate and crop
        # This is a simplified example, assuming `image` is your input image numpy array
        center = (int(xc), int(yc))
        M = cv2.getRotationMatrix2D(center, angle, 1.0)  # Get rotation matrix for the given angle

        # Apply affine transformation - rotating the image
        rotated = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]), flags=cv2.INTER_LINEAR)

        # Create a rectangle enclosing the rotated license plate
        rect = ((xc, yc), (h, w), angle)  # swapping w and h
        box = cv2.boxPoints(rect)
        box = box.astype(np.int32)

        # Get the rotated bounding box coordinates
        x1, y1 = np.min(box, axis=0)
        x2, y2 = np.max(box, axis=0)

        # Crop the rotated image
        license_plate_crop = rotated[y1:y2, x1:x2]

        # Save the cropped license plate
        cv2.imwrite(f'license_plate_{i}.jpg', license_plate_crop)

this is how i crop the the detected image you can also refer this.

NguyenDucQuan12 commented 2 months ago

@kamilalfian I still can't solve it, the model can predict rotated objects (I use the imshow command and see that the bounding box has rotated), but cropping the image doesn't work. So I decided to return to the normal yolo version

NguyenDucQuan12 commented 2 months ago

@samthakur587 I will try tomorrow

NguyenDucQuan12 commented 2 months ago

@samthakur587 I tried your method of cropping the image, but the result does not match the bounding box image

NguyenDucQuan12 commented 2 months ago

@glenn-jocher When I try again with the algorithm you provided, the results don't seem to be much different. How can I tweak the algorithm to make it better? image image

glenn-jocher commented 1 month ago

Hey there! It looks like you're still facing some challenges with the cropping algorithm. To refine the results, you might consider adjusting the rotation angle or the order of operations slightly. Here’s a quick tweak you can try:

Ensure the rotation angle used in the transformation matrix is correctly calculated as negative if needed, since the rotation direction can affect the final output. Also, double-check the coordinates used for cropping to ensure they align with the rotated image's dimensions.

Here's a small modification to the code snippet:

# Correct angle for rotation direction
angle = -angle  # Ensure the rotation is in the correct direction

# Apply affine transformation
M = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))

# Ensure coordinates are within image bounds
x1, y1 = np.clip(np.min(box, axis=0), 0, None)
x2, y2 = np.clip(np.max(box, axis=0), None, [image.shape[1], image.shape[0]])

# Crop the image
license_plate_crop = rotated[y1:y2, x1:x2]

Give this a try and let us know how it goes! 🚀

NguyenDucQuan12 commented 1 month ago

@glenn-jocher The result seems to be even worse, it seems that the rotation angle does not produce the expected results image

for result in results:
    obb = result.obb  # Oriented boxes object for OBB outputs
    point = result.obb.xywhr.tolist()

    for i, ob in enumerate(point):
        xc, yc, w, h, angle = ob  # Unpack the OBB
        angle = -angle  # Ensure the rotation is in the correct direction

        # You'll need to perform additional steps to rotate and crop
        # This is a simplified example, assuming `image` is your input image numpy array
        center = (int(xc), int(yc))
        M = cv2.getRotationMatrix2D(center, angle, 1.0)  # Get rotation matrix for the given angle

        # Apply affine transformation - rotating the image
        rotated = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))

       # Ensure coordinates are within image bounds
        rect = ((xc, yc), (h, w), angle)  # swapping w and h
        box = cv2.boxPoints(rect)
        box = box.astype(np.int32)
        x1, y1 = np.clip(np.min(box, axis=0), 0, None)
        x2, y2 = np.clip(np.max(box, axis=0), None, [image.shape[1], image.shape[0]])

        license_plate_crop = rotated[y1:y2, x1:x2]

        # Save the cropped license plate
        cv2.imwrite(f'license_plate_{i}.jpg', license_plate_crop)
glenn-jocher commented 1 month ago

Hey @NguyenDucQuan12! It looks like the rotation might still be off. Sometimes, the angle provided needs to be adjusted based on how it's interpreted by the rotation function. Try converting the angle from degrees to radians or adjust the sign again. Also, ensure that the width and height are correctly assigned when setting up the rectangle for cropping. Here’s a quick tweak:

angle = angle if some_condition else -angle  # Adjust based on your angle interpretation

M = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))

rect = ((xc, yc), (w, h), angle)  # Ensure w and h are correctly used
box = cv2.boxPoints(rect)
box = np.int0(box)
x1, y1 = np.min(box, axis=0)
x2, y2 = np.max(box, axis=0)

license_plate_crop = rotated[y1:y2, x1:x2]

Let's see if this aligns better with your expectations! 🛠️

NguyenDucQuan12 commented 1 month ago

@glenn-jocher Looks like I'll have to find a more suitable direction, thank you for your enthusiastic help. image

# Load a model
model = YOLO('assest/model/best-yolo-obb.pt')  # pretrained YOLOv8n model

# Run batched inference on a list of images
results = model.predict('assest/image/image_test/2019_12_16_17_54_37_PM423138689.jpg',show = True)  # return a list of Results objects
image = cv2.imread('assest/image/image_test/2019_12_16_17_54_37_PM423138689.jpg')
time.sleep(3)
# Process results list
# Process results list
for result in results:
    obb = result.obb  # Oriented boxes object for OBB outputs
    point = result.obb.xywhr.tolist()

    for i, ob in enumerate(point):
        xc, yc, w, h, angle = ob  # Unpack the OBB

        # You'll need to perform additional steps to rotate and crop
        # This is a simplified example, assuming `image` is your input image numpy array
        center = (int(xc), int(yc))
        M = cv2.getRotationMatrix2D(center, angle, 1.0)  # Get rotation matrix for the given angle

        # Apply affine transformation - rotating the image
        rotated = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]), flags=cv2.INTER_LINEAR)

        # Cropping the rotated image around the center point
        rect = ((xc, yc), (w, h), angle)  # Ensure w and h are correctly used
        box = cv2.boxPoints(rect)
        box = np.intp(box)
        x1, y1 = np.min(box, axis=0)
        x2, y2 = np.max(box, axis=0)

        license_plate_crop = rotated[y1:y2, x1:x2]

        # Save the cropped license plate
        cv2.imwrite(f'license_plate_{i}.jpg', license_plate_crop)
glenn-jocher commented 1 month ago

@NguyenDucQuan12 you're welcome! I'm glad I could help. If you decide to explore other directions, feel free to share your progress or any new challenges you encounter. The YOLO community and Ultralytics team are always here to support you. Best of luck with your project! 🚀

If you need further assistance, don't hesitate to reach out. Happy coding!

NguyenDucQuan12 commented 1 month ago

@glenn-jocher I have another question: my software is showing signs of memory leak. When I checked with memory_profile, I realized that detecting objects with yolo, and reading characters with OCR increased the memory. ram, but after detection is complete the memory is not returned properly. How can I delete those old memories? image

image

# Tiến hành đọc các ký tự từ biển số đã được căt từ hình ảnh gốc
@profile
def get_license_plate(license_plate_crop):

    is_license_plate= False   #default
    result_license_plate= ocrEngine.ocr(license_plate_crop, cls=True)[0]
    # Chuyển đổi ảnh từ dạng mảng sang dạng rgb
    license_plate_crop_cvt = Image.fromarray(license_plate_crop)

    if result_license_plate:
        # Ghép từng ký tự ở hai hàng của biển số lại với nhau: 38-F7
                                                            # 390.01
        license_plate = [line[1][0] for line in result_license_plate]
        # Viết hoa các chữ cái
        license_plate = [i.upper() for i in license_plate]
        license_plate =''.join(license_plate) # bỏ các khoảng trắng
        is_license_plate, license_plate=license_complies_format(license_plate)
    else:
        license_plate='000000000'

    return is_license_plate, license_plate_crop_cvt, license_plate

# Các giá trị mặc định
license_plate_error= Image.open("assest/image/img_src/error.png")   #default

# Cắt ảnh với hộp tọa độ xoay
@profile
def predict(image, save = True):

    is_license_plate= False   #default
    license_plate = error_no_license_plate   #default
    img_path = None    #default

    results = license_plate_detect(image, verbose=False)[0]
    for result in results:
        obb = result.obb  # Oriented boxes object for OBB outputs
        point = result.obb.xywhr.tolist()

        for i, ob in enumerate(point):
            xc, yc, w, h, angle = ob  # Unpack the OBB

            # You'll need to perform additional steps to rotate and crop
            # This is a simplified example, assuming `image` is your input image numpy array
            center = (int(xc), int(yc))
            M = cv2.getRotationMatrix2D(center, angle, 1.0)  # Get rotation matrix for the given angle

            # Apply affine transformation - rotating the image
            rotated = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]), flags=cv2.INTER_LINEAR)

            # Cropping the rotated image around the center point
            x1 = max(int(xc - w / 2), 0)  # Ensuring the crop coordinates are within image bounds
            y1 = max(int(yc - h / 2), 0)
            x2 = min(int(xc + w / 2), image.shape[1])
            y2 = min(int(yc + h / 2), image.shape[0])

            license_plate_crop = rotated[y1:y2, x1:x2]
            is_license_plate, license_plate_crop_cvt, license_plate = get_license_plate(license_plate_crop)
            #lưu và hiển thị ảnh lên màn hình
            if save:
                img_path = save_image_license_plate(is_license_plate,license_plate_crop)
            return license_plate, license_plate_crop_cvt, is_license_plate, img_path

    return license_plate, license_plate_error, is_license_plate, img_path
NguyenDucQuan12 commented 1 month ago

@glenn-jocher The first time the object was detected, the memory increased to 45MB, I think because it needed to store the model, but from later times it increased to 20MB. Is there a way to delete the leftovers (leaving only the amount of memory reserved for the model) after each object detection and OCR? image

glenn-jocher commented 1 month ago

Hi @NguyenDucQuan12,

Thank you for sharing the details and the memory profile. It’s indeed common for the initial memory spike to occur due to model loading. However, the subsequent increases you’re observing could be due to residual data from the detection and OCR processes.

To mitigate this, you can manually clear variables and invoke garbage collection after each detection and OCR operation. Here’s a quick example:

import gc

def clear_memory():
    gc.collect()

# After detection and OCR
license_plate, license_plate_crop_cvt, is_license_plate, img_path = predict(image)
clear_memory()

This should help in freeing up memory that’s no longer in use. If the issue persists, you might want to profile specific parts of your code to pinpoint the exact source of the memory leak.

Feel free to reach out if you have any more questions! 😊

NguyenDucQuan12 commented 1 month ago

@glenn-jocher I am running multi-threaded object detection, which means there can be up to 4 object detection and OCR threads running in parallel at the same time. When I call gc.collect in any thread (because gc.collect is at the end of the OCR script), sometimes it deletes the cursor of sql server, sometimes it crashes the program and exits suddenly. out of the program. With multi-threading, the RAM increases very quickly and shows no signs of decreasing once it's done and I wait for 5 minutes, and I also can't call gc.collect because calling it continuously in the thread can cause unwanted events. How should I handle this problem to get the best results?

glenn-jocher commented 1 month ago

Hi @NguyenDucQuan12,

Managing memory in a multi-threaded environment, especially with intensive tasks like object detection and OCR, can indeed be challenging. It sounds like the use of gc.collect() is causing some unintended side effects in your application.

In multi-threaded applications, it's generally best to minimize shared state and ensure that resources are properly managed within each thread. Here are a couple of suggestions:

  1. Localize Resource Management: Ensure that each thread cleans up its own resources once it's done with them. This includes dereferencing any large objects and ensuring that database connections are properly managed.

  2. Thread-Specific Data: Use thread-local data wherever possible. This can help prevent interference between threads.

  3. Profiling: Since the memory isn't decreasing as expected, it might be helpful to use a profiling tool to identify specific areas where memory is not being released.

  4. Database Connections: If you're using database connections in your threads, consider using a connection pool with a fixed number of connections that threads can check out and return. This can help manage database resources more efficiently and prevent issues with cursors being unexpectedly deleted.

  5. Error Handling: Implement robust error handling within each thread to manage and log exceptions effectively. This can help prevent the entire program from crashing when an issue occurs in a single thread.

If these suggestions don't resolve the issue, you might need to consider restructuring parts of your application for better isolation between threads or reducing the number of concurrent threads if possible.

Hope this helps! Let us know how it goes. 😊

github-actions[bot] commented 2 weeks ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐