openvinotoolkit / datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
https://openvinotoolkit.github.io/datumaro/
MIT License
502 stars 125 forks source link

Export in YOLO with custom subset name #565

Closed DP1701 closed 2 years ago

DP1701 commented 2 years ago

Hello Everyone, I noticed something. When I run the following command within a project:

datum transform -t split -- -t detection --subset train:.8 --subset val:.2 

then the two image folders train and val are created. The format of the data set is a coco_instances format.

If I now export the whole project as YOLO format, then the val data is not exported, because both the folder does not contain name valid and the annotation file.

The console displays the following information:

WARNING: Skipping subset export 'val'. If specified, the only valid names are 'train', ‚valid'

I have currently solved it by renaming the folder and anntotations file from val to valid. After that it works.

Should this be so?

zhiltsov-max commented 2 years ago

Hi. To my understanding, the only subset names supported in this format are these two. From our side, there are no problems with writing any reasonable names, but I'm not sure it can be used by the framework.

DP1701 commented 2 years ago

@zhiltsov-max
Seems to be the case. I was just surprised that the validation data was not exported after the Transform and Export command. I just wanted to note here that this case can occur.

Maybe the warning could be extended with the hint that the val image folder and the val anotation file should be renamed to valid.

images/val to images/valid annotations/instances_val.json to annotations/instances_valid.json

DP1701 commented 2 years ago

@zhiltsov-max Would it be possible in the future to extend the export function for YOLO so that the test subset is also exported? YOLOv5(Open -> Or manually prepare your dataset (click to expand)) and YOLOR can use a test data set.

zhiltsov-max commented 2 years ago

Yes, there are no difficulties with this from Datumaro side. If you can, feel free to send a pull request.