Utilise Solomon's sped up yolo_2_counts

lizgzil commented 10 months ago

For the API @sqr00t found a computational improvement to the yolo_2_segments function in model_utils.py by doing

import torch
from ultralytics.engine.results import Results
from typing import Any, Dict, List, Tuple

def _extract_results_properties(results: Results) -> Tuple[torch.Tensor]:
    # Return useful Tensors from Results
    return (
        results[0].boxes.xywh,
        results[0].names,
        results[0].boxes.cls,
        results[0].boxes.conf,
    )

@torch.jit.script
def _xywh_to_tensor(xywh: torch.Tensor) -> torch.Tensor:
    # Calculate x_min, y_min, x_max, y_max using PyTorch operations
    x_min = xywh[:, 0] - (xywh[:, 2] / 2)
    y_min = xywh[:, 1] - (xywh[:, 3] / 2)
    x_max = xywh[:, 0] + (xywh[:, 2] / 2)
    y_max = xywh[:, 1] + (xywh[:, 3] / 2)

    # Create the segment_tensor using PyTorch operations
    segments_tensor = torch.stack((x_min, y_min, x_max, y_max), dim=-1)

    return segments_tensor

def yolo_to_segments(
    results: Results,
) -> List[Dict[str, Any]]:
    # Extract the useful properties from Results
    xywh_bbox, names, cls, conf = _extract_results_properties(results)

    # Extract segments Tensor
    segments_tensor = _xywh_to_tensor(xywh_bbox)

    # Create the output list using list comprehension and PyTorch operations
    segments = [
        {
            "label": names[label.item()],
            "points": segments_tensor[i].tolist(),
            "type": "polygon",
            "confidence": conf[i],
        }
        for i, label in enumerate(cls)
    ]

    return segments

rather than:

def yolo_2_segments(results):
    """
    Convert the YOLO model prediction output from bounding boxes to segmentation points format.
    Needed for labelling in Prodigy or for use in predict_floorplan.py

    The (x, y) coordinates of the bounding box represent the center of the box,
    while in the segmentation format, the coordinates represent the corners of the polygon.
    See https://github.com/ultralytics/ultralytics/issues/3592.
    """
    segments = []
    for (x, y, w, h), label, conf in zip(
        results[0].boxes.xywh, results[0].boxes.cls, results[0].boxes.conf.numpy()
    ):
        x_min = x.item() - (w.item() / 2)
        y_min = y.item() - (h.item() / 2)
        x_max = x.item() + (w.item() / 2)
        y_max = y.item() + (h.item() / 2)
        segment = [[x_min, y_min], [x_max, y_min], [x_max, y_max], [x_min, y_max]]
        segments.append(
            {
                "label": results[0].names[label.item()],
                "points": segment,
                "type": "polygon",
                "confidence": round(conf, 3),
            }
        )
    return segments

For the time being changing the code to this lead to an error:

from asf_floorplan_interpreter.pipeline.predict_floorplan import FloorplanPredictor

img = 'outputs/figures/floorplan.png' # Local directory or a URL to an image file

fp = FloorplanPredictor(labels_to_predict = ["WINDOW", "DOOR","KITCHEN", "LIVING", "RESTROOM", "BEDROOM", "GARAGE"])
fp.load(local=False) # Set local=True if you have previously downloaded the models
labels, label_counts = fp.predict_labels(img, conf_threshold=0)
fp.plot(img, labels, "outputs/figures/floorplan_prediction.png", plot_label=False)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/elizabethgallagher/Code/asf_floorplan_interpreter/asf_floorplan_interpreter/pipeline/predict_floorplan.py", line 212, in plot
    visual_image = overlay_boundaries_plot(
  File "/Users/elizabethgallagher/Code/asf_floorplan_interpreter/asf_floorplan_interpreter/utils/visualise_image.py", line 95, in overlay_boundaries_plot
    x1, y1 = points[i]
TypeError: cannot unpack non-iterable float object

I expect the output needed for the api use case might be slightly different than for the visualising use case. I imagine it's quickly fixed, but since there are still changes happening (and yolo_2_segments as it currently stands is also in a currently draft PR #2 ) I will keep this as an issue for now. Also that speed in this area isn't a problem for the time being anyway.

sqr00t commented 10 months ago

Great way to include this :)

I could have a look at fp.plot() to see if the datatype needs conversion or something else. I hope it won't be too confusing having a similarly named function (yolo_to_segments as opposed to yolo_2_segments). I'll get back to this after I've written up docs for v1 interpreter API. Thanks for tagging me in!

sqr00t commented 9 months ago

new version of the code, note that:

Each value in the point pairs are kept as PyTorch Tensors, which can be converted to it's float value if needed.
I've tested the output List[Dict[str, Any]] with the load_image(image_url) and overlay_boundaries_plot(visual_image, labels, show=False, plot_label=plot_label) functions in FloorplanPredictor.plot() and I get the same result with yolo_2_segments. No conversion from pt.Tensor to float needed.
I've named this new version of the function differently, to accommodate for potentially batched image prediction outputs for inputs to the model inference that are List[str] or List[Image]. For the case of adapting to the FloorplanPredictor, it just means that the results_obj value to pass into the below function is results[0].

from ultralytics.engine.results import Results
from typing import Any, Dict, List

def yolo_Results_to_segments(
    results_obj: Results,
) -> List[Dict[str, Any]]:
    # Extract the useful properties from Results
    xyxy, names, cls, conf = (
        results_obj.boxes.xyxy, # Tensor([ i , [xmin, ymin, xmax, ymax] ]) where each 'i' array/ 1D tensor is a result bbox in xyxy format
        results_obj.names,
        results_obj.boxes.cls,
        results_obj.boxes.conf,
    )

    # Create the output list using list comprehension, each i'th is an array in xyxy format
    segments = [
        {
            "label": names[label.item()],
            "points": [
                [xyxy[i, 0], xyxy[i, 1]], # xmin, ymin [Tensor, Tensor]
                [xyxy[i, 2], xyxy[i, 1]], # xmax, ymin [Tensor, Tensor]
                [xyxy[i, 2], xyxy[i, 3]], # xmax, ymax [Tensor, Tensor]
                [xyxy[i, 0], xyxy[i, 3]] # xmin, ymax [Tensor, Tensor]
            ],
            "type": "polygon",
            "confidence": conf[i],
        }
        for i, label in enumerate(cls)
    ]

    return segments

nestauk / asf_floorplan_interpreter

Utilise Solomon's sped up yolo_2_counts #19