Open lizgzil opened 10 months ago
I've found the way to fix the issue, but waiting on a PR before adding to any confusion!
spans = {
"label": [
option["text"]
for option in options
if option["id"] in accept_options
][0],
"points": item["spans"][0][
"points"
], # Assuming there's always one span
"type": item["spans"][0]["type"],
"input_hash": item["_input_hash"],
"task_id": item["_task_hash"],
}
and
# Remove duplicate labels, keep the first one only (i.e. in the same floorplan a room has been labelled twice)
task_ids = set()
new_spans = []
for span in prod_label['spans']:
if span['task_id'] not in task_ids:
new_spans.append(span)
task_ids.add(span['task_id'])
prod_label['spans'] = new_spans
There are duplicate labels per floorplan and room in the room type data.
i.e. 2 people have submitted labels for the same room.
This only occurs in 5 floorplans and every time the 2 people have selected the same room type.
However, the duplication may have confused the model. For the 5 floorplans in question the number of labels (all labels) goes to 11 from 18, 8 from 10, 7 from 13, 8 from 14, 18 from 29.
prodigy_to_yolo.py
'data/annotation/prodigy_labelled/211123/yolo_formatted/room_type_yolo_formatted'
with new changes