Closed suzuki-sken closed 2 years ago
👋 Hello @suzuki-sken, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.
@suzuki-sken 👋 Hello! Thanks for asking about YOLOv5 🚀 dataset formatting. To train correctly your data must be in YOLOv5 format. If you specify the same path for train and val then the same data will be used for each, which is not recommended as you will be validating on your training data.
Please see our Train Custom Data tutorial for full documentation on dataset setup and all steps required to start training your first model. A few excerpts from the tutorial:
COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory path
and relative paths to train
/ val
/ test
image directories (or *.txt files with image paths), 2) the number of classes nc
and 3) a list of class names
:
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128 # dataset root dir
train: images/train2017 # train images (relative to 'path') 128 images
val: images/train2017 # val images (relative to 'path') 128 images
test: # test images (optional)
# Classes
nc: 80 # number of classes
names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush' ] # class names
After using a tool like Roboflow Annotate to label your images, export your labels to YOLO format, with one *.txt
file per image (if no objects in image, no *.txt
file is required). The *.txt
file specifications are:
class x_center y_center width height
format.x_center
and width
by image width, and y_center
and height
by image height.The label file corresponding to the above image contains 2 persons (class 0
) and a tie (class 27
):
Organize your train and val images and labels according to the example below. YOLOv5 assumes /coco128
is inside a /datasets
directory next to the /yolov5
directory. YOLOv5 locates labels automatically for each image by replacing the last instance of /images/
in each image path with /labels/
. For example:
../datasets/coco128/images/im0.jpg # image
../datasets/coco128/labels/im0.txt # label
Good luck 🍀 and let us know if you have any other questions!
glenn-jocher, thank you for your prompt reply.
I understood that if I specify the same path for train and val, the same data will be used for each.
I was wondering if there is a method called autosplit in utils/dataloaders.py that implements a mechanism to automatically split the data.
Thanks for your help and further development of YOLOv5!
@suzuki-sken yes you can run the autosplit function manually on a dataset and then update your yaml accordingly.
Hey @suzuki-sken, keep in mind that in the
For example: my_dataset.yaml
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/training_data # dataset root dir
train: ./training.txt # train images (relative to 'path') 128 images
val: ./val.txt # val images (relative to 'path') 128 images
test: # test images (optional)
[...]
./training.txt
./images/1524.png
./images/1670.png
./images/724.png
./images/1692.png
./images/1629.png
./images/887.png
[...]
./val.txt
./images/572.png
./images/234.png
./images/1387.png
./images/299.png
./images/77.png
./images/1735.png
[...]
You can load all the file paths and apply train_test_split()
on it and then write the results in test.txt and val.txt. Or, do a k-fold cross-validation on it, and again, write the results etc.
This will eliminate the need for the files (or labels) to be moved when splitting the dataset.
Keep in mind that the path in .txt files is relative to the dataset root dir path from yaml.
Thanks @Irikos for your reply.
I understand exactly how the autosplit function works and how to define the yaml file.
I would like to generate a yaml file using the autosplit function and verify the generalization performance with k-Fold Cross Validation.
Thanks again @Irikos and @glenn-jocher!
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
Hi, I want to know that how can i do k-Fold Cross Validation on the data.yaml file
hey @gjr2019, I can show you how I did it. There might be better ways, though.
Basically I loaded the entire files folder and the annotations folders, stored their paths into an array, did k-fold on that array and wrote the dataset.txt and annotation.txt files each time, running yolo in that configuration. You could also write directly to dataset.yaml. I also kept an index to know which iteration of the k-fold it is and parsed the txt output to get the performance of each iteration.
Here's part of the actual code I used, but keep in mind that I had only one class, you will need to adapt it to multiple classes if that is the case.
X = np.array(read_images(DATASET_FOLDER))
y = read_annotations_files(ANNOTATIONS_FOLDER)
# print(X)
# print(y)
kf = KFold(n_splits=10)
DATASET_PATH = "../datasets/my_dataset/training_data"
TRAIN_FILE_PATH = "../datasets/my_dataset/training_data/training.txt"
VALIDATION_FILE_PATH = "../datasets/my_dataset/training_data/validation.txt"
DATASET_YAML_PATH = "../datasets/my_dataset/training_data/my_data.yaml"
TEST_FILE_PATH = ""
classes = ["my_class"]
i = 0
for train_index, validation_index in kf.split(X):
print("Starting fold no: " + str(i))
start_time = time.time()
X_train, X_validation = X[train_index], X[validation_index]
file = open(TRAIN_FILE_PATH, "w+")
for item in X_train:
file.write("./images/" + item + "\n")
file.close()
file = open(VALIDATION_FILE_PATH, "w+")
for item in X_validation:
file.write("./images/" + item + "\n")
file.close()
write_yolo_dataset_yaml(classes, DATASET_YAML_PATH, DATASET_PATH)
current_iteration = "y5s_pre_K_" + str(i)
!python ./train.py --img 640 --cfg ./models/yolov5s.yaml --hyp ./data/hyps/hyp.scratch-high.yaml --batch 128 --epochs 2000 --data ../datasets/my_dataset/training_data/my_data.yaml --weights "yolov5s.pt" --workers 24 --name $current_iteration
e = int(time.time() - start_time)
print('{:02d}:{:02d}:{:02d}'.format(e // 3600, (e % 3600 // 60), e % 60))
print("Ended fold no: " + str(i) + "\n")
i += 1
### WRITE THE YOLO CUSTOM DATASET.YAML. Wrote with kfold validation in mind, but useful in general. ###
# INPUT:
# yaml_path: dataset.yaml path
# dataset_path: path to the dataset folder
# train_file: path to file with training images paths (relative to dataset folder)
# validation_file: path to file with validation images paths (relative to dataset folder)
# test_file: path to file with test images paths (relative to dataset folder). Can be empty.
# classes: array of class names to be written in the dataset (note: now classes are hardcoded)
def write_yolo_dataset_yaml(classes, yaml_path, dataset_path, train_file = "./training.txt", validation_file = './validation.txt', test_file = ''):
file = open(yaml_path, "w+")
file.write("path: " + dataset_path + "\n")
file.write("train: " + train_file + "\n") # relative to dataset_path
file.write("val: " + validation_file + "\n")
file.write("test: " + test_file + "\n") # not used yet
file.write("nc: " + str(len(classes)) + "\n")
file.write("names: " + str(classes) + "\n")
file.close()
Search before asking
Question
I am very grateful for your help with YOLOv5. I have a question about the training process. In a previous document, I read a statement that Train and Val variables should be specified in the same directory. If the same directory is specified for the Train and Val variables, will the training and validation data automatically be properly split by the data loader, etc.?
Additional
No response