nestauk / asf_floorplan_interpreter

Modelling to interpret floor plan images to extract or infer information about a property's layout.
MIT License
0 stars 0 forks source link

Utilise Solomon's sped up yolo_2_counts #19

Open lizgzil opened 10 months ago

lizgzil commented 10 months ago

For the API @sqr00t found a computational improvement to the yolo_2_segments function in by doing

import torch
from ultralytics.engine.results import Results
from typing import Any, Dict, List, Tuple

def _extract_results_properties(results: Results) -> Tuple[torch.Tensor]:
    # Return useful Tensors from Results
    return (

def _xywh_to_tensor(xywh: torch.Tensor) -> torch.Tensor:
    # Calculate x_min, y_min, x_max, y_max using PyTorch operations
    x_min = xywh[:, 0] - (xywh[:, 2] / 2)
    y_min = xywh[:, 1] - (xywh[:, 3] / 2)
    x_max = xywh[:, 0] + (xywh[:, 2] / 2)
    y_max = xywh[:, 1] + (xywh[:, 3] / 2)

    # Create the segment_tensor using PyTorch operations
    segments_tensor = torch.stack((x_min, y_min, x_max, y_max), dim=-1)

    return segments_tensor

def yolo_to_segments(
    results: Results,
) -> List[Dict[str, Any]]:
    # Extract the useful properties from Results
    xywh_bbox, names, cls, conf = _extract_results_properties(results)

    # Extract segments Tensor
    segments_tensor = _xywh_to_tensor(xywh_bbox)

    # Create the output list using list comprehension and PyTorch operations
    segments = [
            "label": names[label.item()],
            "points": segments_tensor[i].tolist(),
            "type": "polygon",
            "confidence": conf[i],
        for i, label in enumerate(cls)

    return segments

rather than:

def yolo_2_segments(results):
    Convert the YOLO model prediction output from bounding boxes to segmentation points format.
    Needed for labelling in Prodigy or for use in

    The (x, y) coordinates of the bounding box represent the center of the box,
    while in the segmentation format, the coordinates represent the corners of the polygon.
    segments = []
    for (x, y, w, h), label, conf in zip(
        results[0].boxes.xywh, results[0].boxes.cls, results[0].boxes.conf.numpy()
        x_min = x.item() - (w.item() / 2)
        y_min = y.item() - (h.item() / 2)
        x_max = x.item() + (w.item() / 2)
        y_max = y.item() + (h.item() / 2)
        segment = [[x_min, y_min], [x_max, y_min], [x_max, y_max], [x_min, y_max]]
                "label": results[0].names[label.item()],
                "points": segment,
                "type": "polygon",
                "confidence": round(conf, 3),
    return segments

For the time being changing the code to this lead to an error:

from asf_floorplan_interpreter.pipeline.predict_floorplan import FloorplanPredictor

img = 'outputs/figures/floorplan.png' # Local directory or a URL to an image file

fp = FloorplanPredictor(labels_to_predict = ["WINDOW", "DOOR","KITCHEN", "LIVING", "RESTROOM", "BEDROOM", "GARAGE"])
fp.load(local=False) # Set local=True if you have previously downloaded the models
labels, label_counts = fp.predict_labels(img, conf_threshold=0)
fp.plot(img, labels, "outputs/figures/floorplan_prediction.png", plot_label=False)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/elizabethgallagher/Code/asf_floorplan_interpreter/asf_floorplan_interpreter/pipeline/", line 212, in plot
    visual_image = overlay_boundaries_plot(
  File "/Users/elizabethgallagher/Code/asf_floorplan_interpreter/asf_floorplan_interpreter/utils/", line 95, in overlay_boundaries_plot
    x1, y1 = points[i]
TypeError: cannot unpack non-iterable float object

I expect the output needed for the api use case might be slightly different than for the visualising use case. I imagine it's quickly fixed, but since there are still changes happening (and yolo_2_segments as it currently stands is also in a currently draft PR #2 ) I will keep this as an issue for now. Also that speed in this area isn't a problem for the time being anyway.

sqr00t commented 10 months ago

Great way to include this :)

I could have a look at fp.plot() to see if the datatype needs conversion or something else. I hope it won't be too confusing having a similarly named function (yolo_to_segments as opposed to yolo_2_segments). I'll get back to this after I've written up docs for v1 interpreter API. Thanks for tagging me in!

sqr00t commented 9 months ago

new version of the code, note that:

from ultralytics.engine.results import Results
from typing import Any, Dict, List

def yolo_Results_to_segments(
    results_obj: Results,
) -> List[Dict[str, Any]]:
    # Extract the useful properties from Results
    xyxy, names, cls, conf = (
        results_obj.boxes.xyxy, # Tensor([ i , [xmin, ymin, xmax, ymax] ]) where each 'i' array/ 1D tensor is a result bbox in xyxy format

    # Create the output list using list comprehension, each i'th is an array in xyxy format
    segments = [
            "label": names[label.item()],
            "points": [
                [xyxy[i, 0], xyxy[i, 1]], # xmin, ymin [Tensor, Tensor]
                [xyxy[i, 2], xyxy[i, 1]], # xmax, ymin [Tensor, Tensor]
                [xyxy[i, 2], xyxy[i, 3]], # xmax, ymax [Tensor, Tensor]
                [xyxy[i, 0], xyxy[i, 3]] # xmin, ymax [Tensor, Tensor]
            "type": "polygon",
            "confidence": conf[i],
        for i, label in enumerate(cls)

    return segments