ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.9k stars 16.39k forks source link

Specifying Label Path in Customized Dataset #8246

Closed bryanbocao closed 2 years ago

bryanbocao commented 2 years ago

Search before asking

Question

Hello! I like the way this repo organize! I was trying to do some sort of "grid search" for investigating performance of Yolo. Specifically, I have a base coco dataset, the one exactly downloaded by the script in coco.yaml and would like to have variations on two levels: (1) in the image data level, I do some image processing and have different sets of image, say images_v2, images_v3, images_v4 while images is the base one; (2) in the label level for bbox, I also have different variations such as changing label names, number of classes or category ids saved in various sets of label folders: labels_v2, labels_v3, labels_v4.

Below is a brief structure of files in dataset/coco:

images
images_v2
images_v3
images_v4
labels
labels_v2
labels_v3
labels_v4
train2017.txt
val2017.txt
test-dev2017.txt

By "Grid Search" I mean I will have one result for each pair of images* and labels*, resulting in 4(images) x 4(labels) =16 sets of experiments in total.

Q1: Is there any way to do that efficiently?

A straight forward way is to have 16 datasets of coco like coco_1, coco_2, coco_3 while each corresponds to one pair of images* and labels*. However, it requires 16 x 20.1GB=321.6GB space which is too much for me.

When sweeping images*, it seems that I can just change the image paths in train2017.txt and val2017.txt, but the default label path is labels and I don't see I can specify the path in https://github.com/ultralytics/yolov5/blob/master/data/coco.yaml. Q2: Is there any way to do that?

Appreciate your help!

Additional

No response

glenn-jocher commented 2 years ago

@bryanbo-cao 👋 Hello! Thanks for asking about YOLOv5 🚀 dataset formatting. You could just use one data.yaml and bash script to rename your directories between each of the 16 trainings.

For examples of using image directories instead of txt lists of images see other datasets like VOC.yaml: https://github.com/ultralytics/yolov5/blob/d6051382f1551455b88ca086b99275cfc8286131/data/VOC.yaml#L1-L21

To train correctly your data must be in YOLOv5 format. Please see our Train Custom Data tutorial for full documentation on dataset setup and all steps required to start training your first model. A few excerpts from the tutorial:

1.1 Create dataset.yaml

COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory path and relative paths to train / val / test image directories (or *.txt files with image paths), 2) the number of classes nc and 3) a list of class names:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes
nc: 80  # number of classes
names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
         'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
         'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
         'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
         'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
         'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
         'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
         'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
         'hair drier', 'toothbrush' ]  # class names

1.2 Create Labels

After using a tool like Roboflow Annotate to label your images, export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). The *.txt file specifications are:

Image Labels

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

1.3 Organize Directories

Organize your train and val images and labels according to the example below. YOLOv5 assumes /coco128 is inside a /datasets directory next to the /yolov5 directory. YOLOv5 locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/. For example:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

Good luck 🍀 and let us know if you have any other questions!

bryanbocao commented 2 years ago

@glenn-jocher, thanks for pointing it again. I have read this document and succeeded in different custom datasets many times but I am afraid it didn't answer my specific question. The document is about 1 dataset while I am asking N variants of 1 dataset that share the same dataset root dir without duplicating.

bryanbocao commented 2 years ago

In the above example,

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

The folder name labels seems to be fixed by default. This document does not specify how to change it if I have labels_v2 or labels_v3 in the same folder:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label
../datasets/coco128/labels_v2/im0.txt  # label_v2
../datasets/coco128/labels_v3/im0.txt  # label_v3

Thanks!

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

GusevMihail commented 2 years ago

In the above example,

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

The folder name labels seems to be fixed by default. This document does not specify how to change it if I have labels_v2 or labels_v3 in the same folder:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label
../datasets/coco128/labels_v2/im0.txt  # label_v2
../datasets/coco128/labels_v3/im0.txt  # label_v3

Thanks!

Hello, bryanbocao! Did you find a way to solve your problem? I have a same problem now. I would be grateful if you share your experience.

chobits commented 2 years ago

I have the same problem that I wanna specify path of labels directory. However, from the source code, this featue is not supported currently, because labels directory is auto generated from xxx/images/xxx images directory, which is what official documents say.

See https://github.com/ultralytics/yolov5/blob/8a19437690548a158b78ab27b7f5b463a268fa19/utils/dataloaders.py#L481

and

https://github.com/ultralytics/yolov5/blob/8a19437690548a158b78ab27b7f5b463a268fa19/utils/dataloaders.py#L426

glenn-jocher commented 1 year ago

@chobits hello! Thank you for bringing that to our attention. The labels directory's current auto-generation from the images directory is indeed in line with the current behavior. While specifying a separate path for labels isn't currently supported, your feedback has been duly noted and will be taken into account for future improvements.

Feel free to keep an eye on the release notes and documentation updates for any future changes. We appreciate your understanding and patience!

Akshaykushawaha commented 6 months ago

@chobits hello! Thank you for bringing that to our attention. The labels directory's current auto-generation from the images directory is indeed in line with the current behavior. While specifying a separate path for labels isn't currently supported, your feedback has been duly noted and will be taken into account for future improvements.

Feel free to keep an eye on the release notes and documentation updates for any future changes. We appreciate your understanding and patience!

Any updates on this? Seems like a small fix, should have been fixed by now, since it is one of the most basic input feature.

chobits commented 6 months ago

@chobits hello! Thank you for bringing that to our attention. The labels directory's current auto-generation from the images directory is indeed in line with the current behavior. While specifying a separate path for labels isn't currently supported, your feedback has been duly noted and will be taken into account for future improvements. Feel free to keep an eye on the release notes and documentation updates for any future changes. We appreciate your understanding and patience!

Any updates on this? Seems like a small fix, should have been fixed by now, since it is one of the most basic input feature.

I fixed it by modifying my local branch at that time. It's been a long time since I last recalled the context. I didn't verify the official update, but still, thanks you for your work.

glenn-jocher commented 6 months ago

Hello! Thanks for checking back on this. As of now, there hasn't been an official update to support specifying separate paths for the labels directory directly through the configuration. We understand the importance of this feature and appreciate your input, which helps in enhancing the functionality of YOLOv5.

If there are updates regarding this feature, they'll be included in the release notes and documentation. Thanks once again for your patience and for being a part of the YOLOv5 community! 🌟

chobits commented 6 months ago

Hello! Thanks for checking back on this. As of now, there hasn't been an official update to support specifying separate paths for the labels directory directly through the configuration....

Ok, I understood.

If there are updates regarding this feature, they'll be included in the release notes and documentation. Thanks once again for your patience and for being a part of the YOLOv5 community! 🌟

Cool! Looking forward to seeing the new features.

glenn-jocher commented 6 months ago

Hello! We're glad to hear your enthusiasm and appreciate your support! We'll definitely keep the community updated on any new features and enhancements. If you have any more questions or need further assistance in the meantime, don't hesitate to ask. Happy coding! 😊🚀