ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.75k stars 16.34k forks source link

train model on multiple folders/sources #662

Closed Borda closed 4 years ago

Borda commented 4 years ago

🚀 Feature

According the wiki now we can train model just from one source/folder. I would be interested in having option to define multiple folders...

Motivation

My vase is that I have several datasets for the same problem v1, v2, v3, etc and I am interested which datasets or their combination gives what results, e.g. train model on v1 only or v2+v3, but I do not want to create a copy all these combinations as it is quite waste of space...

Pitch

the solution shall be quite simple, just allow multiple folders in training/dataset config and eventually update dataloader...

github-actions[bot] commented 4 years ago

Hello @Borda, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

glenn-jocher commented 4 years ago

@Borda this already exists. If you look at coco128.yaml for example you can see the options for defining train and val datasets. You can specify a *.txt file of image paths, a directory, or a list of the first two options: https://github.com/ultralytics/yolov5/blob/93684531c6e71547667ee19df6ddb94af3c8c80d/data/coco128.yaml#L10-L13

glenn-jocher commented 4 years ago

@Borda if you mix datasets though, you should make sure that they have the same classes. To see an example of this, you can simply change L11 to this to use two train sets:

 train: 
  - ../coco128/images/train2017/  # 128 images 
  - ../coco128/images/train2017/  # 128 images 
 val: ../coco128/images/train2017/  # 128 images 
Borda commented 4 years ago

cool, so maybe update it on the wiki page...

glenn-jocher commented 4 years ago

Yeah that's not a bad idea.

Dhirajdgandhi commented 3 years ago

Hey, As per the solution for this, when I try to make a list, it only picks the last folder for training. Any idea for a fix?

glenn-jocher commented 3 years ago

@Dhirajdgandhi that's incorrect. All items in the list are utilized.

Dhirajdgandhi commented 3 years ago

@Dhirajdgandhi that's incorrect. All items in the list are utilized.

I see. You are right. I rechecked and it is working. However, if we see the logs, it only mentions one of all the directories, "Scanning ......" and that made me think it's only picking one of the directories.

glenn-jocher commented 3 years ago

@Dhirajdgandhi yes, a *.cache file is created for all labels and by default it is named after the directory (i.e. dir.cache), or in the case of multiple directories it is assigned the name of the last entry (i.e. dir3.cache).

Dhirajdgandhi commented 3 years ago

@Dhirajdgandhi yes, a *.cache file is created for all labels and by default it is named after the directory (i.e. dir.cache), or in the case of multiple directories it is assigned the name of the last entry (i.e. dir3.cache).

Thank you for your reply.

aleshem commented 8 months ago

@Borda if you mix datasets though, you should make sure that they have the same classes. To see an example of this, you can simply change L11 to this to use two train sets:

 train: 
  - ../coco128/images/train2017/  # 128 images 
  - ../coco128/images/train2017/  # 128 images 
 val: ../coco128/images/train2017/  # 128 images 

Hi Glen, When I try to configure comet with multiple paths in train/val I get the following error

  File "/home/ubuntu/multi/yolov5_snpe_conversion/utils/loggers/comet/__init__.py", line 342, in upload_dataset_artifact
    metadata[key] = split_path.replace(path, "")
                    ^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'replace'
path: /home/ubuntu/data/V7_tagged/yolo_db, 
split_path: ['/home/ubuntu/data/V7_tagged/yolo_db/images_with_bb/7471_low_lux_frames_with_leds_keypoints_train', '/home/ubuntu/data/V7_tagged/yolo_db/images_no_keyboard/20240221No_KeyboardFrames_eaTrain_LowLight500/images']

This solved the problem: utils/loggers/comet/init.py

    def upload_dataset_artifact(self):
        dataset_name = self.data_dict.get("dataset_name", "yolov5-dataset")
        path = str((ROOT / Path(self.data_dict["path"])).resolve())

        metadata = self.data_dict.copy()
        for key in ["train", "val", "test"]:
            split_path = metadata.get(key)
            if split_path is not None:
                if isinstance(split_path, list):
                    split_path_new = [split_path_i.replace(path, "") for split_path_i in split_path]
                    metadata[key] = split_path_new
                else:
                    metadata[key] = split_path.replace(path, "")
hdd0510 commented 7 months ago

Hello everyone, I have a question about this.

train: /u01/dunghd/train.txt
val: /u01/dunghd/test.txt
# Classes
names:
  0: UTDD

My file yaml is like this, the dir to *.txt file is to get the specific images in a large dataset, but the labels not in the same directory. Is this necessary to manually copy the data?

glenn-jocher commented 7 months ago

@hdd0510 hello! It's not necessary to manually copy the data if your .txt files correctly point to the image paths and corresponding label paths. Ensure your dataset .txt files (for train and val) list the images' paths, with each image path followed by its corresponding label path. If your labels are in a different directory but are correctly referenced by the paths in your .txt files, YOLOv5 will be able to use them directly without the need to duplicate data. Keep up the good work! 🚀

hdd0510 commented 7 months ago

@glenn-jocher , Thank you for this help, I am done with this training!

glenn-jocher commented 7 months ago

@hdd0510 you're welcome! I'm glad to hear your training went well. If you have any more questions or need further assistance, feel free to ask. Happy detecting! 🚀