Closed Borda closed 4 years ago
Hello @Borda, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@Borda this already exists. If you look at coco128.yaml for example you can see the options for defining train and val datasets. You can specify a *.txt file of image paths, a directory, or a list of the first two options: https://github.com/ultralytics/yolov5/blob/93684531c6e71547667ee19df6ddb94af3c8c80d/data/coco128.yaml#L10-L13
@Borda if you mix datasets though, you should make sure that they have the same classes. To see an example of this, you can simply change L11 to this to use two train sets:
train:
- ../coco128/images/train2017/ # 128 images
- ../coco128/images/train2017/ # 128 images
val: ../coco128/images/train2017/ # 128 images
cool, so maybe update it on the wiki page...
Yeah that's not a bad idea.
Hey, As per the solution for this, when I try to make a list, it only picks the last folder for training. Any idea for a fix?
@Dhirajdgandhi that's incorrect. All items in the list are utilized.
@Dhirajdgandhi that's incorrect. All items in the list are utilized.
I see. You are right. I rechecked and it is working. However, if we see the logs, it only mentions one of all the directories, "Scanning ......" and that made me think it's only picking one of the directories.
@Dhirajdgandhi yes, a *.cache file is created for all labels and by default it is named after the directory (i.e. dir.cache), or in the case of multiple directories it is assigned the name of the last entry (i.e. dir3.cache).
@Dhirajdgandhi yes, a *.cache file is created for all labels and by default it is named after the directory (i.e. dir.cache), or in the case of multiple directories it is assigned the name of the last entry (i.e. dir3.cache).
Thank you for your reply.
@Borda if you mix datasets though, you should make sure that they have the same classes. To see an example of this, you can simply change L11 to this to use two train sets:
train: - ../coco128/images/train2017/ # 128 images - ../coco128/images/train2017/ # 128 images val: ../coco128/images/train2017/ # 128 images
Hi Glen, When I try to configure comet with multiple paths in train/val I get the following error
File "/home/ubuntu/multi/yolov5_snpe_conversion/utils/loggers/comet/__init__.py", line 342, in upload_dataset_artifact
metadata[key] = split_path.replace(path, "")
^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'replace'
path: /home/ubuntu/data/V7_tagged/yolo_db,
split_path: ['/home/ubuntu/data/V7_tagged/yolo_db/images_with_bb/7471_low_lux_frames_with_leds_keypoints_train', '/home/ubuntu/data/V7_tagged/yolo_db/images_no_keyboard/20240221No_KeyboardFrames_eaTrain_LowLight500/images']
This solved the problem: utils/loggers/comet/init.py
def upload_dataset_artifact(self):
dataset_name = self.data_dict.get("dataset_name", "yolov5-dataset")
path = str((ROOT / Path(self.data_dict["path"])).resolve())
metadata = self.data_dict.copy()
for key in ["train", "val", "test"]:
split_path = metadata.get(key)
if split_path is not None:
if isinstance(split_path, list):
split_path_new = [split_path_i.replace(path, "") for split_path_i in split_path]
metadata[key] = split_path_new
else:
metadata[key] = split_path.replace(path, "")
Hello everyone, I have a question about this.
train: /u01/dunghd/train.txt
val: /u01/dunghd/test.txt
# Classes
names:
0: UTDD
My file yaml is like this, the dir to *.txt file is to get the specific images in a large dataset, but the labels not in the same directory. Is this necessary to manually copy the data?
@hdd0510 hello! It's not necessary to manually copy the data if your .txt
files correctly point to the image paths and corresponding label paths. Ensure your dataset .txt
files (for train
and val
) list the images' paths, with each image path followed by its corresponding label path. If your labels are in a different directory but are correctly referenced by the paths in your .txt
files, YOLOv5 will be able to use them directly without the need to duplicate data. Keep up the good work! 🚀
@glenn-jocher , Thank you for this help, I am done with this training!
@hdd0510 you're welcome! I'm glad to hear your training went well. If you have any more questions or need further assistance, feel free to ask. Happy detecting! 🚀
🚀 Feature
According the wiki now we can train model just from one source/folder. I would be interested in having option to define multiple folders...
Motivation
My vase is that I have several datasets for the same problem v1, v2, v3, etc and I am interested which datasets or their combination gives what results, e.g. train model on v1 only or v2+v3, but I do not want to create a copy all these combinations as it is quite waste of space...
Pitch
the solution shall be quite simple, just allow multiple folders in training/dataset config and eventually update dataloader...