voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.85k stars 558 forks source link

[BUG] Error when loading a dataset with CVAT 1.1 format and missing annotations after uploading back to CVAT #3332

Closed SDMMG closed 1 year ago

SDMMG commented 1 year ago

System information

Describe the problem

I've configured a project in CVAT manually, uploaded (manually too) and labeled a dataset. Then, I've downloaded manually a copy of the project with the 'Export dataset' option with 'CVAT for images 1.1' format. Now, I'd like to upload the full dataset again to CVAT. When I try to load the dataset with the Dataset.from_dir() function (as I reproduce down below), FiftyOne throws this error:

  File "<string>", line 1, in <module>
  File "venv\lib\site-packages\fiftyone\core\dataset.py", line 5315, in from_dir
    dataset.add_dir(
  File "venv\lib\site-packages\fiftyone\core\dataset.py", line 4049, in add_dir
    return self.add_importer(
  File "venv\lib\site-packages\fiftyone\core\dataset.py", line 4609, in add_importer
    return foud.import_samples(
  File "venv\lib\site-packages\fiftyone\utils\data\importers.py", line 142, in import_samples
    sample_ids = dataset.add_samples(
  File "venv\lib\site-packages\fiftyone\core\dataset.py", line 2481, in add_samples
    for batch in batcher:
  File "venv\lib\site-packages\fiftyone\core\utils.py", line 1139, in __next__
    batch.append(next(self._iter))
  File "venv\lib\site-packages\fiftyone\utils\cvat.py", line 567, in __next__
    labels = cvat_image.to_labels()
  File "venv\lib\site-packages\fiftyone\utils\cvat.py", line 1437, in to_labels
    detections = [b.to_detection(frame_size) for b in self.boxes]
  File "venv\lib\site-packages\fiftyone\utils\cvat.py", line 1437, in <listcomp>
    detections = [b.to_detection(frame_size) for b in self.boxes]
  File "venv\lib\site-packages\fiftyone\utils\cvat.py", line 1774, in to_detection
    return fol.Detection(
  File "venv\lib\site-packages\fiftyone\core\odm\embedded_document.py", line 49, in __init__
    super().__init__(*args, **kwargs)
  File "venv\lib\site-packages\mongoengine\document.py", line 90, in __init__
    super().__init__(*args, **kwargs)
  File "venv\lib\site-packages\mongoengine\base\document.py", line 127, in __init__
    value = field.to_python(value)
  File "venv\lib\site-packages\fiftyone\core\fields.py", line 1026, in to_python
    return fou.deserialize_numpy_array(value)
  File "venv\lib\site-packages\fiftyone\core\utils.py", line 1475, in deserialize_numpy_array
    with io.BytesIO(zlib.decompress(numpy_bytes)) as f:TypeError: a bytes-like object is required, not 'str'

I solve this by changing:

This way, I can load the dataset without any problem, and it can be visualized on FiftyOne App with all the labels, bboxes and segmentation masks. The other problem that I have is that I can't upload the tags, bounding boxes and segmentation masks all together back to CVAT. My dataset has the following labels:

image

I can upload the annotations separately in different jobs, i.e., job0 containing images with bounding boxes using label_field="detections", job1 with segmentation masks using label_field="polylines", and so on; but I need to upload all the images with bboxes and segmentation masks all together in the same job. What can I be doing wrong?

Thanks in advance.

Code to reproduce issue

import fiftyone as fo

dataset = fo.Dataset.from_dir(
    dataset_dir  = 'downloaded_project',
    dataset_type = fo.types.CVATImageDataset
)

anno_key = "cvat_test"
dataset.compute_metadata(num_workers=1)

dataset.annotate(
    anno_key,
    backend="cvat",
    label_field='detections',
    label_type='detections',
    project_name='Backup_test',
    task_name='test',
    segment_size=1500
)

What areas of FiftyOne does this bug affect?

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

brimoor commented 1 year ago

Hi @SDMMG 👋

The TypeError: a bytes-like object is required, not 'str' you are facing appears to be because your exported CVAT objects have an attribute called mask that contains string values. However, mask is a reserved attribute of the Detection class that, if provided, expects a numpy array containing an instance mask.

Is there any way you can rename your string-valued mask attribute to something else in CVAT? If not, as a workaround, you could manually update this function to rename your mask attribute while importing: https://github.com/voxel51/fiftyone/blob/ab947f50c5998609aeef99204295e8ffcb93f599/fiftyone/utils/cvat.py#L2356

Your proposed change:

Detection(label=label, bounding_box=bounding_box, **attributes)  # current
Detection(label=label, bounding_box=bounding_box, attributes=attributes)  # proposed

is not desirable because the attributes keyword is a deprecated feature of Label classes that we'd like to remove in the future. The recommended best practice is to store custom attributes as dynamic properties, as is done by the current approach above. The limitation of the current approach, of course, is that you cannot store differently-typed data in the builtin attribute names: https://github.com/voxel51/fiftyone/blob/ab947f50c5998609aeef99204295e8ffcb93f599/fiftyone/core/labels.py#L405-L409

brimoor commented 1 year ago

Regarding your second question about uploading multiple label fields, FiftyOne's CVAT integration does support including multiple fields in a label schema. Does this work for you? https://docs.voxel51.com/integrations/cvat.html#annotating-multiple-fields

SDMMG commented 1 year ago

Hi @brimoor,

Thanks for the quick response. I changed the name of the mask attribute to another and it worked. I believe that I can rename it in the rest of the cases, so that will no longer be a problem.

About the second question, I tried to use the label_schema but I didn't make it work. I've tried again now and, finally, I was able to upload the annotations with a little project. I have to check if it works in the bigger one.

Thank you really much for your time and help 😊.

SDMMG commented 1 year ago

Hi @brimoor,

I've been trying to use label_schema and I'm able to upload the annotations right, but I have one problem. I use this schema to define the bounding box:

label_schema = {
    "polylines":{
        "type":"polygons",
        "classes":["class1"],
        "attributes":{
            "attribute1":{
                "type":"select",
                "values":["value1","value2","value3","value4","value5"],
                "default":"value1"
            }
        }
    }
}

With this, the labels of the project are well generated, except for the type of the label. I want it to be polygon, but in CVAT always appears the type any. How can I change that via FiftyOne?

Thanks!

SDMMG commented 1 year ago

@brimoor correct me if I'm wrong, but I guess this might be caused because in the create_project function defined in fiftyone.utils.cvat (line 3917), the json that is made doesn't take into account the type parameter. Thus, if the following command is added in the line 3931, this could be solved:

labels = [
            {"name": name, "attributes": list(attributes.values())}
            for name, attributes in schema.items()
        ]        # current

labels = [
            {"name": name, "type":type, "attributes": list(attributes.values())}
            for name, type, attributes in schema.items()
        ]       # proposed

This would work if schema had a key named type, that I believe it could be added in function _convert_cvat_schema in line 4951.

brimoor commented 1 year ago

I'm not too familiar with the internals of the CVAT integration, but you could very well be right that CVAT project creation can be improved. @ehofesmann could you help advise here?

@SDMMG in the meantime, we have CVAT tests at the link below; if you make a PR with the suggested change and the tests still pass, then we're probably looking good: https://github.com/voxel51/fiftyone/blob/develop/tests/intensive/cvat_tests.py

SDMMG commented 1 year ago

Well, this change is not that simple, other modifications must be done. But if I'm not bussy, I can give it a try and I'll open the issue in the future with the changes. Thanks for your time, @brimoor 😊