nestauk / asf_floorplan_interpreter

Modelling to interpret floor plan images to extract or infer information about a property's layout.
MIT License
0 stars 0 forks source link

Utilise Solomon's sped up yolo_2_counts #19

Open lizgzil opened 10 months ago

lizgzil commented 10 months ago

For the API @sqr00t found a computational improvement to the yolo_2_segments function in model_utils.py by doing

import torch
from ultralytics.engine.results import Results
from typing import Any, Dict, List, Tuple

def _extract_results_properties(results: Results) -> Tuple[torch.Tensor]:
    # Return useful Tensors from Results
    return (
        results[0].boxes.xywh,
        results[0].names,
        results[0].boxes.cls,
        results[0].boxes.conf,
    )

@torch.jit.script
def _xywh_to_tensor(xywh: torch.Tensor) -> torch.Tensor:
    # Calculate x_min, y_min, x_max, y_max using PyTorch operations
    x_min = xywh[:, 0] - (xywh[:, 2] / 2)
    y_min = xywh[:, 1] - (xywh[:, 3] / 2)
    x_max = xywh[:, 0] + (xywh[:, 2] / 2)
    y_max = xywh[:, 1] + (xywh[:, 3] / 2)

    # Create the segment_tensor using PyTorch operations
    segments_tensor = torch.stack((x_min, y_min, x_max, y_max), dim=-1)

    return segments_tensor

def yolo_to_segments(
    results: Results,
) -> List[Dict[str, Any]]:
    # Extract the useful properties from Results
    xywh_bbox, names, cls, conf = _extract_results_properties(results)

    # Extract segments Tensor
    segments_tensor = _xywh_to_tensor(xywh_bbox)

    # Create the output list using list comprehension and PyTorch operations
    segments = [
        {
            "label": names[label.item()],
            "points": segments_tensor[i].tolist(),
            "type": "polygon",
            "confidence": conf[i],
        }
        for i, label in enumerate(cls)
    ]

    return segments

rather than:

def yolo_2_segments(results):
    """
    Convert the YOLO model prediction output from bounding boxes to segmentation points format.
    Needed for labelling in Prodigy or for use in predict_floorplan.py

    The (x, y) coordinates of the bounding box represent the center of the box,
    while in the segmentation format, the coordinates represent the corners of the polygon.
    See https://github.com/ultralytics/ultralytics/issues/3592.
    """
    segments = []
    for (x, y, w, h), label, conf in zip(
        results[0].boxes.xywh, results[0].boxes.cls, results[0].boxes.conf.numpy()
    ):
        x_min = x.item() - (w.item() / 2)
        y_min = y.item() - (h.item() / 2)
        x_max = x.item() + (w.item() / 2)
        y_max = y.item() + (h.item() / 2)
        segment = [[x_min, y_min], [x_max, y_min], [x_max, y_max], [x_min, y_max]]
        segments.append(
            {
                "label": results[0].names[label.item()],
                "points": segment,
                "type": "polygon",
                "confidence": round(conf, 3),
            }
        )
    return segments

For the time being changing the code to this lead to an error:

from asf_floorplan_interpreter.pipeline.predict_floorplan import FloorplanPredictor

img = 'outputs/figures/floorplan.png' # Local directory or a URL to an image file

fp = FloorplanPredictor(labels_to_predict = ["WINDOW", "DOOR","KITCHEN", "LIVING", "RESTROOM", "BEDROOM", "GARAGE"])
fp.load(local=False) # Set local=True if you have previously downloaded the models
labels, label_counts = fp.predict_labels(img, conf_threshold=0)
fp.plot(img, labels, "outputs/figures/floorplan_prediction.png", plot_label=False)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/elizabethgallagher/Code/asf_floorplan_interpreter/asf_floorplan_interpreter/pipeline/predict_floorplan.py", line 212, in plot
    visual_image = overlay_boundaries_plot(
  File "/Users/elizabethgallagher/Code/asf_floorplan_interpreter/asf_floorplan_interpreter/utils/visualise_image.py", line 95, in overlay_boundaries_plot
    x1, y1 = points[i]
TypeError: cannot unpack non-iterable float object

I expect the output needed for the api use case might be slightly different than for the visualising use case. I imagine it's quickly fixed, but since there are still changes happening (and yolo_2_segments as it currently stands is also in a currently draft PR #2 ) I will keep this as an issue for now. Also that speed in this area isn't a problem for the time being anyway.

sqr00t commented 10 months ago

Great way to include this :)

I could have a look at fp.plot() to see if the datatype needs conversion or something else. I hope it won't be too confusing having a similarly named function (yolo_to_segments as opposed to yolo_2_segments). I'll get back to this after I've written up docs for v1 interpreter API. Thanks for tagging me in!

sqr00t commented 9 months ago

new version of the code, note that:

from ultralytics.engine.results import Results
from typing import Any, Dict, List

def yolo_Results_to_segments(
    results_obj: Results,
) -> List[Dict[str, Any]]:
    # Extract the useful properties from Results
    xyxy, names, cls, conf = (
        results_obj.boxes.xyxy, # Tensor([ i , [xmin, ymin, xmax, ymax] ]) where each 'i' array/ 1D tensor is a result bbox in xyxy format
        results_obj.names,
        results_obj.boxes.cls,
        results_obj.boxes.conf,
    )

    # Create the output list using list comprehension, each i'th is an array in xyxy format
    segments = [
        {
            "label": names[label.item()],
            "points": [
                [xyxy[i, 0], xyxy[i, 1]], # xmin, ymin [Tensor, Tensor]
                [xyxy[i, 2], xyxy[i, 1]], # xmax, ymin [Tensor, Tensor]
                [xyxy[i, 2], xyxy[i, 3]], # xmax, ymax [Tensor, Tensor]
                [xyxy[i, 0], xyxy[i, 3]] # xmin, ymax [Tensor, Tensor]
            ],
            "type": "polygon",
            "confidence": conf[i],
        }
        for i, label in enumerate(cls)
    ]

    return segments