v7labs / darwin-py

Library and commandline tool for managing datasets on darwin.v7labs.com
MIT License
114 stars 40 forks source link

Type error in convert_polygons_to_sequences() (Conversion to COCO) #462

Closed SchernHe closed 1 year ago

SchernHe commented 1 year ago

Hey,

I am using darwin-py==0.7.25 and it seems like there is a bug in convert_polygons_to_sequences() when calling:

  # Create release
  dataset.export(name=release_name, annotation_class_ids=class_ids)

  # Download release
  release: Release = dataset.get_release(release_name)
  dataset.pull(release=release, multi_threaded=False)

  # Convert to COCO
  dataset_path = os.path.join(DARWIN_PATH, PROJECT, v7_dataset_name, "releases", release_name)
  annotation_paths = glob(f"{dataset_path}/**/*.json", recursive=True)
  parser = get_exporter("coco")
  export_annotations(parser, annotation_paths, dataset_dir)

Traceback:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 171, in execute
    return_value = self.execute_callable()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 189, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/home/airflow/auvisus-ml-airflow/auvisus_airflow/handler/darwin/download.py", line 63, in download_from_darwin
    export_annotations(parser, annotation_paths, dataset_dir)
  File "/home/airflow/.local/lib/python3.8/site-packages/darwin/exporter/exporter.py", line 53, in export_annotations
    exporter(darwin_to_dt_gen(file_paths), Path(output_directory))
  File "/home/airflow/.local/lib/python3.8/site-packages/darwin/exporter/formats/coco.py", line 35, in export
    output = _build_json(list(annotation_files))
  File "/home/airflow/.local/lib/python3.8/site-packages/darwin/exporter/formats/coco.py", line 324, in _build_json
    "annotations": list(_build_annotations(annotation_files, categories)),
  File "/home/airflow/.local/lib/python3.8/site-packages/darwin/exporter/formats/coco.py", line 402, in _build_annotations
    annotation_data = _build_annotation(annotation_file, annotation_id, annotation, categories)
  File "/home/airflow/.local/lib/python3.8/site-packages/darwin/exporter/formats/coco.py", line 436, in _build_annotation
    sequences = convert_polygons_to_sequences(annotation.data["paths"])
  File "/home/airflow/.local/lib/python3.8/site-packages/darwin/utils.py", line 721, in convert_polygons_to_sequences
    x = max(min(point["x"], width - 1) if width else point["x"], 0)
TypeError: list indices must be integers or slices, not str

I already found out that one element/point of polygon inside the loop is a list itself and I guess that this is not handled properly. Checking the corresponding image, I could see that there are two disjunct paths for a single object.

Edit: As a quick fix, I just deleted the corresponding images in the UI. However, when using the automated release creation with the SDK, the corresponding images/annotations are still present in the release.

Edit 2: I fixed that by manually removing the files and the conversion export_annotations(parser, annotation_paths, dataset_dir) runs through. However, the created COCO JSON is corrupted and all the images have the same image ID (0)

owencjones commented 1 year ago

Closing as relates to old version now - @SchernHe please let me know if you're having issues with current versions.