Closed Mingfeng-Wang closed 3 years ago
Hello @ItsHelloWod, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@ItsHelloWod best practices here is to always transfer your data to a local medium before anything else, rather than relying on mounted drives or buckets. Otherwise you have a network request each time you want to access a file, which is extremely slow when you have many small files, which is the ML use case.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
How I may disable images scanning before training?
@IssaIssa1 you don't. Scanning is performed on every new dataset and on every dataset modification.
@Mingfeng-Wang Maybe it's a formatting problem with the names of the images in the dataset. I tried random naming, and Scanning images was really slow. However, when I rename the image to format, it gets faster.
000001.jpg
000002.jpg
...
...
009999.jpg
@glenn-jocher The difference between the speed of the front and back is not caused by this reason
@Olalaye @IssaIssa1 scanning the 120k COCO images takes about 60 seconds typically, or 2000 images per second.
If your images are larger or your machine is slower than normal, or your hard drive suffers from slow read speeds you may see slower scanning speeds.
@Olalaye @IssaIssa1 scanning the 120k COCO images takes about 60 seconds typically, or 2000 images per second.
If your images are larger or your machine is slower than normal, or your hard drive suffers from slow read speeds you may see slower scanning speeds.
I see. I guess because I am working on colab and maybe because I am using a shortcut for the data files it is taking long time (1.32s/it).
@IssaIssa1 Google Colab notebook displays scanning speeds of 2000 images per second for COCO128. I'd clearly advise you to use the official notebook as a starting point for your own trainings.
The official notebook is linked in many places in the repo: https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb
@ItsHelloWod best practices here is to always transfer your data to a local medium before anything else, rather than relying on mounted drives or buckets. Otherwise you have a network request each time you want to access a file, which is extremely slow when you have many small files, which is the ML use case.
Thanks, that helped! Instead of mounting to GDrive I downloaded my Dataset from RoboFlow and forked the repo directly.
and indeed it took less than a minute for 25k images.
i'm having the same issue. the custom dataset is on a local drive on colab however the scanning speed is ~4-5it/s. my initial guess was that it has to resize the input images and thats making it take so long to scan the images. however after i checked the coco dataset, it had the same process (since the images were not all of them same size) but it was alot faster.
train: weights=yolov5l.pt, cfg=, data=/content/yolov5/data1.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-193-g485da42 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)
hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=201
from n params module arguments
0 -1 1 7040 models.common.Conv [3, 64, 6, 2, 2]
1 -1 1 73984 models.common.Conv [64, 128, 3, 2]
2 -1 3 156928 models.common.C3 [128, 128, 3]
3 -1 1 295424 models.common.Conv [128, 256, 3, 2]
4 -1 6 1118208 models.common.C3 [256, 256, 6]
5 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
6 -1 9 6433792 models.common.C3 [512, 512, 9]
7 -1 1 4720640 models.common.Conv [512, 1024, 3, 2]
8 -1 3 9971712 models.common.C3 [1024, 1024, 3]
9 -1 1 2624512 models.common.SPPF [1024, 1024, 5]
10 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 3 2757632 models.common.C3 [1024, 512, 3, False]
14 -1 1 131584 models.common.Conv [512, 256, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 3 690688 models.common.C3 [512, 256, 3, False]
18 -1 1 590336 models.common.Conv [256, 256, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 3 2495488 models.common.C3 [512, 512, 3, False]
21 -1 1 2360320 models.common.Conv [512, 512, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 3 9971712 models.common.C3 [1024, 1024, 3, False]
24 [17, 20, 23] 1 1109310 models.yolo.Detect [201, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
Model summary: 368 layers, 47215294 parameters, 47215294 gradients, 111.7 GFLOPs
Transferred 607/613 items from yolov5l.pt
AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 101 weight(decay=0.0), 104 weight(decay=0.0005), 104 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning /content/dataset/labels/train... 26258 images, 0 backgrounds, 0 corrupt: 49% 26258/53739 [1:31:03<1:58:09, 3.88it/s]
@Khalilolgod thanks for bringing up this issue. The scanning speed during the training process can vary depending on several factors such as the size of the images, the speed of the storage medium, and the machine's performance.
In your case, it seems that the custom dataset on the local drive in Colab is experiencing a slower scanning speed of around 4-5 iterations per second. The initial assumption of image resizing affecting the scanning speed might not be the primary reason, as the COCO dataset also goes through a similar process but at a much faster rate.
However, it's important to note that scanning speed is affected by various factors like the size of the dataset and the performance of the machine. Since Colab relies on network requests to access files in mounted drives or buckets, it can result in slower scanning speeds, especially when dealing with a large number of small image files.
One best practice to improve scanning performance is to transfer your dataset to a local medium before training, rather than relying on mounted drives or buckets. This helps avoid the network requests for each file access, thereby significantly improving scanning speed, particularly when dealing with a large number of small files.
Based on your feedback, it seems that downloading the dataset from RoboFlow and forking the repo directly led to a much-improved scanning speed, taking less than a minute for 25k images.
I hope this explanation clarifies the reason behind the scanning speed differences and provides you with some insights on how to optimize the process. If you have any further questions, please feel free to ask.
I have upload the dataset on google drive, while when I training the YOLO model on colab, the scanning spped is too slow, and always at 2% progress. Before this training, I was using a smaller dataset to train the same model, the scanning is very fast. I'm not sure if this is related to the fact that I used a larger data set, but even so, why does the scan progress always stay at 2%? Thanks in advance.
@Ares-cz the slow scanning speed and the progress stalling at 2% could be due to a few reasons:
To address this issue, try the following steps:
Remember, training on Colab with large datasets directly from Google Drive is not recommended due to the potential for slow I/O speeds. Always aim to use the local filesystem of the environment where you're training your models.
❔Question
Scanning images is very slow.
Additional context
I m using google colab, read imgs from google drive.
Transferred 362/370 items from yolov5s.pt Optimizer groups: 62 .bias, 70 conv.weight, 59 other Scanning images: 62% 801/1293 [18:31<11:15, 1.37s/it]
It spends 1.37 to scanning 1 img how could it be possible?