roboflow / supervision

We write your reusable computer vision tools. πŸ’œ
https://supervision.roboflow.com
MIT License
22.47k stars 1.68k forks source link

DetectionDataset.as_coco having annotation id collision between splits #768

Open adbcode opened 8 months ago

adbcode commented 8 months ago

Search before asking

Bug

When exporting a YOLOv8 formatted dataset to COCO format (using the attached code), the JSON files associated with each split uses its own sequence for assigning annotation ID values.

This causes issues when trying to import the output dataset with other libraries, which expect a unique ID across all splits for each annotation.

Example from dataset with train, valid and test splits:

Kindly consider using a common sequence when generating annotation IDs for a dataset across splits.

Environment

Supervision 0.16.0

Minimal Reproducible Example

import supervision as sv

yolo = sv.DetectionDataset.from_yolo( images_directory_path=f"{dataset_root}/images", annotations_directory_path=f"{dataset_root}/labels", data_yaml_path=f"{dataset_root}/data.yaml", force_masks=True )

yolo.as_coco( images_directory_path=f"{target}/images", annotations_path=f"{target}/annotations.json" )

Additional

No response

Are you willing to submit a PR?

SkalskiP commented 8 months ago

Hi @adbcode πŸ‘‹πŸ» Thanks a lot for your interest in supervision. How does that influence your workflow?

adbcode commented 8 months ago

Hi @adbcode πŸ‘‹πŸ» Thanks a lot for your interest in supervision. How does that influence your workflow?

Hello! When using it with other libraries, especially those who expect unique IDs for each annotation, the dataset gets corrupted on import.

Current workaround is to recreate the IDs during import, but it loses the original order.

SkalskiP commented 8 months ago

@adbcode, Would something like this satisfy you:

adbcode commented 8 months ago

@adbcode, Would something like this satisfy you:

  • Merging YOLO splits.

  • Converting YOLO to COCO.

  • Splitting COCO into subsets while preserving ID continuity.

this will be fine as long as we can recreate the original split in the end.