nestauk / asf_floorplan_interpreter

Modelling to interpret floor plan images to extract or infer information about a property's layout.
MIT License
0 stars 0 forks source link

Fix duplicate bug and re train the room type model #18

Open lizgzil opened 10 months ago

lizgzil commented 10 months ago

There are duplicate labels per floorplan and room in the room type data.

i.e. 2 people have submitted labels for the same room.

This only occurs in 5 floorplans and every time the 2 people have selected the same room type.

However, the duplication may have confused the model. For the 5 floorplans in question the number of labels (all labels) goes to 11 from 18, 8 from 10, 7 from 13, 8 from 14, 18 from 29.

lizgzil commented 10 months ago

I've found the way to fix the issue, but waiting on a PR before adding to any confusion!

Screenshot 2023-11-29 at 18 15 41

spans = {
                    "label": [
                        option["text"]
                        for option in options
                        if option["id"] in accept_options
                    ][0],
                    "points": item["spans"][0][
                        "points"
                    ],  # Assuming there's always one span
                    "type": item["spans"][0]["type"],
                    "input_hash": item["_input_hash"], 
                    "task_id": item["_task_hash"],
                }

and

# Remove duplicate labels, keep the first one only (i.e. in the same floorplan a room has been labelled twice)
task_ids = set()
new_spans = []
for span in prod_label['spans']:
    if span['task_id'] not in task_ids:
        new_spans.append(span)
        task_ids.add(span['task_id'])

prod_label['spans'] = new_spans