openvinotoolkit / training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
https://openvinotoolkit.github.io/training_extensions/
Apache License 2.0
1.14k stars 443 forks source link

'otx train' in workspace throws (dataset?) error #2042

Closed QSmally closed 1 year ago

QSmally commented 1 year ago

I've got openvinotoolkit/training_extensions successfully installed in an Ubuntu:20.04 container. I've tried it with Python virtual environments, but I couldn't get it working with the setups that I got going. Docker worked first-try after installing all the dependencies.

Describe the bug

Running otx train in a workspace, with a linked coco dataset (yanked from Roboflow), throws an error. It changes between a couple of exceptions each time I run the command. It always starts with the following trace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/datumaro/components/dataset.py", line 1392, in import_from
    env.make_extractor(src_conf.format, src_conf.url, **extractor_kwargs)
  File "/usr/local/lib/python3.9/dist-packages/datumaro/components/environment.py", line 227, in make_extractor
    return self.extractors.get(name)(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'ctx'

However, it continues with During handling of the above exception, another exception occurred: out of these couple of errors (from running it four times, stack traces omitted):

Steps to Reproduce

I was following this starting guide to train a testing model.

  1. $ docker build -t trainer . (directory with Dockerfile from below)
  2. $ docker run -it --rm -v "$(pwd)/pathtodataset:/mnt/stage1:ro" trainer
  3. trainer$ otx build Custom_Object_Detection_Gen3_ATSS --backbone mmdet.MobileNetV2 --task DETECTION --train-data-roots /mnt/stage1/train/ --val-data-roots /mnt/stage1/valid/ --test-data-roots /mnt/stage1/test/
  4. trainer$ cd otx-workspace-DETECTION/
  5. trainer$ otx train

I'm new to OpenVINO: maybe I missed indicating which dataset format it should parse /mnt/stage1 as?

Environment:

Dockerfile dependency snippet:

RUN apt update && \
    apt install curl python3.9 python3.9-dev python3.9-distutils git g++ ffmpeg libsm6 libxext6 libgl1-mesa-glx -y && \
    curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
    python3.9 get-pip.py && \
    rm get-pip.py
RUN git clone https://github.com/openvinotoolkit/training_extensions.git && \
    cd training_extensions && \
    pip install torch==1.13.1+cpu torchvision==0.14.1+cpu torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu && \
    pip install -e .[full]

Dataset directory:

/mnt/stage1/
├── test/
|     + _annotations.coco.json
|     + ... images
├── train/
|     + _annotations.coco.json
|     + ... images
├── valid/
|     + _annotations.coco.json
|     + ... images
goodsong81 commented 1 year ago

@QSmally Thank you for reporting. We'll look into it.

@vinnamkim Could you take a look? It seems that there is some issue while parsing the COCO dataset. Datumaro is regarding it as CelebA format, I suppose.

cc: @wonjuleee

vinnamkim commented 1 year ago

Hi @QSmally,

openvinotoolkit/training_extensions uses Datumaro to import datasets. However, we currently do not support COCO format which has directory structure like yours (yanked from Roboflow). We expect the COCO format structured as follows:

└─ Dataset/
    ├── dataset_meta.json # a list of custom labels (optional)
    ├── images/
    │   ├── train/
    │   │   ├── <image_name1.ext>
    │   │   ├── <image_name2.ext>
    │   │   └── ...
    │   └── val/
    │       ├── <image_name1.ext>
    │       ├── <image_name2.ext>
    │       └── ...
    └── annotations/
        ├── <task>_<subset_name>.json
        └── ...

https://openvinotoolkit.github.io/datumaro/stable/docs/explanation/formats/coco.html#import-coco-dataset

You can also see the example for this structure in here: https://github.com/openvinotoolkit/datumaro/tree/develop/tests/assets/coco_dataset/coco_instances.

However, we noticed that 1) there is a demand for the directory structure in COCO format exported by Roboflow and 2) it currently provides ambiguous stack traces to the user. We will address these improvements in the next version.

QSmally commented 1 year ago

Ty @vinnamkim,

I did like you said, and updated the paths from --train-data-roots /mnt/shared/train to /mnt/shared and alike and the command functions correctly on the COCO dataset. Please note this exception when you input /mnt/shared/images/train to these options (could be caught into a user-friendlier message in the future):

Traceback (most recent call last):
  File "/usr/local/bin/otx", line 8, in <module>
    sys.exit(main())
  File "/training_extensions/otx/cli/tools/cli.py", line 77, in main
    results = globals()[f"otx_{name}"]()
  File "/training_extensions/otx/cli/tools/train.py", line 165, in main
    return train(exit_stack)
  File "/training_extensions/otx/cli/tools/train.py", line 186, in train
    dataset, label_schema = dataset_adapter.get_otx_dataset(), dataset_adapter.get_label_schema()
  File "/training_extensions/otx/core/data/adapter/detection_dataset_adapter.py", line 29, in get_otx_dataset
    label_information = self._prepare_label_information(self.dataset)
  File "/training_extensions/otx/core/data/adapter/base_dataset_adapter.py", line 191, in _prepare_label_information
    category_items = label_categories_list.items
AttributeError: 'NoneType' object has no attribute 'items'
sungmanc commented 1 year ago

Ty @vinnamkim,

I did like you said, and updated the paths from --train-data-roots /mnt/shared/train to /mnt/shared and alike and the command functions correctly on the COCO dataset. Please note this exception when you input /mnt/shared/images/train to these options (could be caught into a user-friendlier message in the future):

Traceback (most recent call last):
  File "/usr/local/bin/otx", line 8, in <module>
    sys.exit(main())
  File "/training_extensions/otx/cli/tools/cli.py", line 77, in main
    results = globals()[f"otx_{name}"]()
  File "/training_extensions/otx/cli/tools/train.py", line 165, in main
    return train(exit_stack)
  File "/training_extensions/otx/cli/tools/train.py", line 186, in train
    dataset, label_schema = dataset_adapter.get_otx_dataset(), dataset_adapter.get_label_schema()
  File "/training_extensions/otx/core/data/adapter/detection_dataset_adapter.py", line 29, in get_otx_dataset
    label_information = self._prepare_label_information(self.dataset)
  File "/training_extensions/otx/core/data/adapter/base_dataset_adapter.py", line 191, in _prepare_label_information
    category_items = label_categories_list.items
AttributeError: 'NoneType' object has no attribute 'items'

Thanks for reporting, Could you check with otx train [template] --train-data-roots /mnt/shared --val-data-roots /mnt/shared. Current OTX internally finds internal train, val, test folder, so users don't need to input the folder itself.

vinnamkim commented 1 year ago

Ty @vinnamkim,

I did like you said, and updated the paths from --train-data-roots /mnt/shared/train to /mnt/shared and alike and the command functions correctly on the COCO dataset. Please note this exception when you input /mnt/shared/images/train to these options (could be caught into a user-friendlier message in the future):

Traceback (most recent call last):
  File "/usr/local/bin/otx", line 8, in <module>
    sys.exit(main())
  File "/training_extensions/otx/cli/tools/cli.py", line 77, in main
    results = globals()[f"otx_{name}"]()
  File "/training_extensions/otx/cli/tools/train.py", line 165, in main
    return train(exit_stack)
  File "/training_extensions/otx/cli/tools/train.py", line 186, in train
    dataset, label_schema = dataset_adapter.get_otx_dataset(), dataset_adapter.get_label_schema()
  File "/training_extensions/otx/core/data/adapter/detection_dataset_adapter.py", line 29, in get_otx_dataset
    label_information = self._prepare_label_information(self.dataset)
  File "/training_extensions/otx/core/data/adapter/base_dataset_adapter.py", line 191, in _prepare_label_information
    category_items = label_categories_list.items
AttributeError: 'NoneType' object has no attribute 'items'

I made a notebook example which can convert the dataset exported from Roboflow to the Datumaro importable COCO format: https://github.com/vinnamkim/training_extensions/blob/support-roboflow-coco-format/notebooks/roboflow_coco.ipynb. I ran this notebook in the latest develop (ff5be9fc2c95050a267ca690492d93da5b1dc88b).

QSmally commented 1 year ago

Thanks for reporting, Could you check with otx train [template] --train-data-roots /mnt/shared --val-data-roots /mnt/shared. Current OTX internally finds internal train, val, test folder, so users don't need to input the folder itself.

I confirmed earlier that the above functioned correctly, I only mentioned that the error could be improved to indicate its issue. Ty, however, @sungmanc

vinnamkim commented 1 year ago