voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.92k stars 569 forks source link

[CVAT integration] Use pixelwise masks, not polygons, for instance segmentation #4483

Open auee028 opened 5 months ago

auee028 commented 5 months ago

Proposal Summary

Following the FiftyOne's document, I figured out that FiftyOne's instance segmentation works with polygons rather than pixel-wise masked segments. For integrating CVAT and FiftyOne, instance segmentation would be much better when annotated with pixelwise masks rather than polygons.

Motivation

I am trying to integrate CVAT with FiftyOne, whilst using SAM for segmentation in CVAT.

This issue has already been suggested in https://github.com/voxel51/fiftyone/issues/3750. It seems that many users want pixelwise masks rather than polygons for segmentation annotations.

When annotating segmentation using SAM in CVAT, the annotation results are masked segments, not polygons. It is needed to load dataset and annotations with masks in FiftyOne, load the annotated dataset into CVAT to add/edit/delete some annotations, and load the modified annotations back into FiftyOne. Expected annotation type is masked segments, not polygons, but instance segmentation for integration CVAT and FiftyOne seems to work only with polygons which don't appear to be expected as pixelwise segments.

When I try to load detections of FiftyOne into CVAT, all annotations are gone and only images are loaded in CVAT. Also when I annotate segmentation using SAM in CVAT and try to load the annotated image data into FiftyOne, again any annotations disappear and again soley images are shown in FiftyOne App. When I set label_type=instance as in the code line below

dataset.annotate(
    anno_key,
    label_field='ground_truth',
    label_type='instance',
    url='http://localhost:8080/'
)

then the annotations are loaded but in polygons, not in pixel-wise segments.

What areas of FiftyOne does this feature affect?

Details

Expected implementations for instance segmentation with pixelwise segments:

dataset.annotate(
    anno_key,
    label_field='detections',
    label_type='instances',
    url='http://localhost:8080/'
)

Expected implementations for instance segmentation with polygons:

dataset.annotate(
    anno_key,
    label_field='detections',
    label_type='polygons',
    url='http://localhost:8080/'
)

Willingness to contribute

The FiftyOne Community welcomes contributions! Would you or another member of your organization be willing to contribute an implementation of this feature?

karl-joan commented 5 months ago

I'm having the exact same issue. I guess the current workaround is to export the masks from CVAT and add the masks to each image with a script

NicDionne commented 2 months ago

I am facing the same issue, and it becomes even more of a concern when the mask contains islands or holes, as CVAT does not support polygons with holes. This has become a bottleneck for us. I would also be willing to contribute to a solution with guidance from the FiftyOne team.

brimoor commented 2 months ago

Hi @NicDionne and @auee028 👋

Apologies for the delayed response and thanks for the feature request and your willingness to contribute a solution! I'd be happy to point you in the right direction 😄

I agree with the proposal above where this syntax:

dataset.annotate(anno_key, label_field='detections', label_type='instances')

should upload instance masks as pixelwise masks rather than as polygons.

For context, I believe the only reason that the current implementation uses polygons is that pixelwise masks weren't supported by CVAT when this integration was originally built!

Here's the code that needs to be updated:

NicDionne commented 1 month ago

Thanks, @brimoor for the guidance. I'll look into it.

geke-mir commented 1 month ago

Not to overload this issue, but maybe to seed more development on a similar task - what would be the best way to extend this to semantic labels that are uploaded with label_type="segementation"?

I suppose an alternative would be to only allow users to use instances for CVAT annotation and you can fuse instances -> semantic labels when you need them later during a dataset export, etc.