Closed QSmally closed 1 year ago
@QSmally Thank you for reporting. We'll look into it.
@vinnamkim Could you take a look? It seems that there is some issue while parsing the COCO dataset. Datumaro is regarding it as CelebA format, I suppose.
cc: @wonjuleee
Hi @QSmally,
openvinotoolkit/training_extensions
uses Datumaro to import datasets. However, we currently do not support COCO format which has directory structure like yours (yanked from Roboflow). We expect the COCO format structured as follows:
└─ Dataset/
├── dataset_meta.json # a list of custom labels (optional)
├── images/
│ ├── train/
│ │ ├── <image_name1.ext>
│ │ ├── <image_name2.ext>
│ │ └── ...
│ └── val/
│ ├── <image_name1.ext>
│ ├── <image_name2.ext>
│ └── ...
└── annotations/
├── <task>_<subset_name>.json
└── ...
You can also see the example for this structure in here: https://github.com/openvinotoolkit/datumaro/tree/develop/tests/assets/coco_dataset/coco_instances.
However, we noticed that 1) there is a demand for the directory structure in COCO format exported by Roboflow and 2) it currently provides ambiguous stack traces to the user. We will address these improvements in the next version.
Ty @vinnamkim,
I did like you said, and updated the paths from --train-data-roots /mnt/shared/train
to /mnt/shared
and alike and the command functions correctly on the COCO dataset. Please note this exception when you input /mnt/shared/images/train
to these options (could be caught into a user-friendlier message in the future):
Traceback (most recent call last):
File "/usr/local/bin/otx", line 8, in <module>
sys.exit(main())
File "/training_extensions/otx/cli/tools/cli.py", line 77, in main
results = globals()[f"otx_{name}"]()
File "/training_extensions/otx/cli/tools/train.py", line 165, in main
return train(exit_stack)
File "/training_extensions/otx/cli/tools/train.py", line 186, in train
dataset, label_schema = dataset_adapter.get_otx_dataset(), dataset_adapter.get_label_schema()
File "/training_extensions/otx/core/data/adapter/detection_dataset_adapter.py", line 29, in get_otx_dataset
label_information = self._prepare_label_information(self.dataset)
File "/training_extensions/otx/core/data/adapter/base_dataset_adapter.py", line 191, in _prepare_label_information
category_items = label_categories_list.items
AttributeError: 'NoneType' object has no attribute 'items'
Ty @vinnamkim,
I did like you said, and updated the paths from
--train-data-roots /mnt/shared/train
to/mnt/shared
and alike and the command functions correctly on the COCO dataset. Please note this exception when you input/mnt/shared/images/train
to these options (could be caught into a user-friendlier message in the future):Traceback (most recent call last): File "/usr/local/bin/otx", line 8, in <module> sys.exit(main()) File "/training_extensions/otx/cli/tools/cli.py", line 77, in main results = globals()[f"otx_{name}"]() File "/training_extensions/otx/cli/tools/train.py", line 165, in main return train(exit_stack) File "/training_extensions/otx/cli/tools/train.py", line 186, in train dataset, label_schema = dataset_adapter.get_otx_dataset(), dataset_adapter.get_label_schema() File "/training_extensions/otx/core/data/adapter/detection_dataset_adapter.py", line 29, in get_otx_dataset label_information = self._prepare_label_information(self.dataset) File "/training_extensions/otx/core/data/adapter/base_dataset_adapter.py", line 191, in _prepare_label_information category_items = label_categories_list.items AttributeError: 'NoneType' object has no attribute 'items'
Thanks for reporting,
Could you check with otx train [template] --train-data-roots /mnt/shared --val-data-roots /mnt/shared
. Current OTX internally finds internal train
, val
, test
folder, so users don't need to input the folder itself.
Ty @vinnamkim,
I did like you said, and updated the paths from
--train-data-roots /mnt/shared/train
to/mnt/shared
and alike and the command functions correctly on the COCO dataset. Please note this exception when you input/mnt/shared/images/train
to these options (could be caught into a user-friendlier message in the future):Traceback (most recent call last): File "/usr/local/bin/otx", line 8, in <module> sys.exit(main()) File "/training_extensions/otx/cli/tools/cli.py", line 77, in main results = globals()[f"otx_{name}"]() File "/training_extensions/otx/cli/tools/train.py", line 165, in main return train(exit_stack) File "/training_extensions/otx/cli/tools/train.py", line 186, in train dataset, label_schema = dataset_adapter.get_otx_dataset(), dataset_adapter.get_label_schema() File "/training_extensions/otx/core/data/adapter/detection_dataset_adapter.py", line 29, in get_otx_dataset label_information = self._prepare_label_information(self.dataset) File "/training_extensions/otx/core/data/adapter/base_dataset_adapter.py", line 191, in _prepare_label_information category_items = label_categories_list.items AttributeError: 'NoneType' object has no attribute 'items'
I made a notebook example which can convert the dataset exported from Roboflow to the Datumaro importable COCO format: https://github.com/vinnamkim/training_extensions/blob/support-roboflow-coco-format/notebooks/roboflow_coco.ipynb. I ran this notebook in the latest develop (ff5be9fc2c95050a267ca690492d93da5b1dc88b).
Thanks for reporting, Could you check with
otx train [template] --train-data-roots /mnt/shared --val-data-roots /mnt/shared
. Current OTX internally finds internaltrain
,val
,test
folder, so users don't need to input the folder itself.
I confirmed earlier that the above functioned correctly, I only mentioned that the error could be improved to indicate its issue. Ty, however, @sungmanc
I've got
openvinotoolkit/training_extensions
successfully installed in an Ubuntu:20.04 container. I've tried it with Python virtual environments, but I couldn't get it working with the setups that I got going. Docker worked first-try after installing all the dependencies.Describe the bug
Running
otx train
in a workspace, with a linkedcoco
dataset (yanked from Roboflow), throws an error. It changes between a couple of exceptions each time I run the command. It always starts with the following trace:However, it continues with
During handling of the above exception, another exception occurred:
out of these couple of errors (from running it four times, stack traces omitted):File "<attrs generated init datumaro.components.annotation.Label>", line 8, in __init__ TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
File "/usr/lib/python3.9/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
File "/usr/local/lib/python3.9/dist-packages/datumaro/plugins/data_formats/align_celeba.py", line 69, in _load_items raise DatasetImportError("File '%s': was not found" % labels_path) datumaro.components.errors.DatasetImportError: File '/mnt/stage1/train/Anno/identity_CelebA.txt': was not found
Steps to Reproduce
I was following this starting guide to train a testing model.
$ docker build -t trainer .
(directory with Dockerfile from below)$ docker run -it --rm -v "$(pwd)/pathtodataset:/mnt/stage1:ro" trainer
trainer$ otx build Custom_Object_Detection_Gen3_ATSS --backbone mmdet.MobileNetV2 --task DETECTION --train-data-roots /mnt/stage1/train/ --val-data-roots /mnt/stage1/valid/ --test-data-roots /mnt/stage1/test/
trainer$ cd otx-workspace-DETECTION/
trainer$ otx train
I'm new to OpenVINO: maybe I missed indicating which dataset format it should parse
/mnt/stage1
as?Environment:
Dockerfile dependency snippet:
Dataset directory: