openvinotoolkit / datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
https://openvinotoolkit.github.io/datumaro/
MIT License
524 stars 129 forks source link

Empty lines in VOC subset lists are not ignored #583

Closed EdjeElectronics closed 2 years ago

EdjeElectronics commented 2 years ago

I labeled a small object detection dataset using LabelImg, a popular and open-source labeling tool. LabelImg creates an XML file in Pascal VOC annotation format for each image in the dataset. I tried putting these files in the proper Pascal VOC folder structure (described in Datumaro documentation here) and importing it into Datumaro. It imported successfully, but I got an error when running the datum export command to export it in COCO format.

Here is a link to the Pascal VOC dataset I'm working with: https://s3.us-west-1.amazonaws.com/evanjuras.com/code/pet-toys-voc-dataset.zip

Steps to reproduce issue:

cd ~
wget https://s3.us-west-1.amazonaws.com/evanjuras.com/code/pet-toys-voc-dataset.zip
unzip pet-toys-voc-dataset.zip -d pet-toys-voc-dataset
datum create
datum import --format voc_detection ~/custom_annotations/pet-toys-voc-dataset
datum export -f coco_instances -o ~/pet-toys-coco-dataset -- --save-images

Error log:

datum export -f coco_instances -o custom-pet-toys1-coco -- --save-images
2021-12-15 12:11:12,016 INFO: Loading the project...
2021-12-15 12:11:12,018 INFO: Exporting...
2021-12-15 12:11:12,027 ERROR: 
Traceback (most recent call last):
  File "/home/stanley1/datumaro/bin/datum", line 8, in <module>
    sys.exit(main())
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/cli/__main__.py", line 162, in main
    retcode = args.command(args)
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/util/scope.py", line 133, in wrapped_func
    ret_val = func(*args, **kwargs)
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/cli/contexts/project/__init__.py", line 203, in export_command
    dataset.export(save_dir=dst_dir, format=converter, **extra_args)
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/util/scope.py", line 133, in wrapped_func
    ret_val = func(*args, **kwargs)
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/components/dataset.py", line 847, in export
    converter.convert(self, save_dir=save_dir, **kwargs)
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/components/converter.py", line 37, in convert
    return converter.apply()
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/plugins/coco_format/converter.py", line 661, in apply
    for item in subset:
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/components/dataset.py", line 221, in __iter__
    yield from self.parent._data.get_subset(self.name)
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/components/dataset.py", line 221, in __iter__
    yield from self.parent._data.get_subset(self.name)
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/plugins/voc_format/extractor.py", line 145, in __iter__
    yield DatasetItem(id=item_id, subset=self._subset,
  File "<attrs generated init datumaro.components.extractor.DatasetItem>", line 16, in __init__
  File "/home/stanley1/datumaro/lib/python3.8/site-packages/datumaro/util/attrs_util.py", line 11, in not_empty
    assert len(x) != 0, x
AssertionError

Can you please help me figure out why I'm getting this error?

zhiltsov-max commented 2 years ago

Hi, thank you for reporting the problem. To resolve, try to remove an empty line in the end of ImageSets\Main\default.txt.

EdjeElectronics commented 2 years ago

@zhiltsov-max that worked! Thank you very much. I was successfully able to export it to COCO format.

zhiltsov-max commented 2 years ago

I'll keep it open for fixing.