ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.99k stars 16.16k forks source link

Scanning images is very slow. #1145

Closed Mingfeng-Wang closed 3 years ago

Mingfeng-Wang commented 3 years ago

❔Question

Scanning images is very slow.

Additional context

I m using google colab, read imgs from google drive.

Transferred 362/370 items from yolov5s.pt Optimizer groups: 62 .bias, 70 conv.weight, 59 other Scanning images: 62% 801/1293 [18:31<11:15, 1.37s/it]

It spends 1.37 to scanning 1 img how could it be possible?

github-actions[bot] commented 3 years ago

Hello @ItsHelloWod, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

glenn-jocher commented 3 years ago

@ItsHelloWod best practices here is to always transfer your data to a local medium before anything else, rather than relying on mounted drives or buckets. Otherwise you have a network request each time you want to access a file, which is extremely slow when you have many small files, which is the ML use case.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

IssaIssa1 commented 3 years ago

How I may disable images scanning before training?

glenn-jocher commented 3 years ago

@IssaIssa1 you don't. Scanning is performed on every new dataset and on every dataset modification.

Olalaye commented 3 years ago

@Mingfeng-Wang Maybe it's a formatting problem with the names of the images in the dataset. I tried random naming, and Scanning images was really slow. However, when I rename the image to format, it gets faster.

000001.jpg
000002.jpg
...
...
009999.jpg

@glenn-jocher The difference between the speed of the front and back is not caused by this reason

glenn-jocher commented 3 years ago

@Olalaye @IssaIssa1 scanning the 120k COCO images takes about 60 seconds typically, or 2000 images per second.

If your images are larger or your machine is slower than normal, or your hard drive suffers from slow read speeds you may see slower scanning speeds.

IssaIssa1 commented 3 years ago

@Olalaye @IssaIssa1 scanning the 120k COCO images takes about 60 seconds typically, or 2000 images per second.

If your images are larger or your machine is slower than normal, or your hard drive suffers from slow read speeds you may see slower scanning speeds.

I see. I guess because I am working on colab and maybe because I am using a shortcut for the data files it is taking long time (1.32s/it).

glenn-jocher commented 3 years ago

@IssaIssa1 Google Colab notebook displays scanning speeds of 2000 images per second for COCO128. I'd clearly advise you to use the official notebook as a starting point for your own trainings.

Screen Shot 2021-01-07 at 5 02 11 PM
glenn-jocher commented 3 years ago

The official notebook is linked in many places in the repo: https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb

shukkkur commented 1 year ago

@ItsHelloWod best practices here is to always transfer your data to a local medium before anything else, rather than relying on mounted drives or buckets. Otherwise you have a network request each time you want to access a file, which is extremely slow when you have many small files, which is the ML use case.

Thanks, that helped! Instead of mounting to GDrive I downloaded my Dataset from RoboFlow and forked the repo directly.

image

and indeed it took less than a minute for 25k images.

Khalilolgod commented 1 year ago

i'm having the same issue. the custom dataset is on a local drive on colab however the scanning speed is ~4-5it/s. my initial guess was that it has to resize the input images and thats making it take so long to scan the images. however after i checked the coco dataset, it had the same process (since the images were not all of them same size) but it was alot faster.

image

train: weights=yolov5l.pt, cfg=, data=/content/yolov5/data1.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-193-g485da42 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=201

                 from  n    params  module                                  arguments                     
  0                -1  1      7040  models.common.Conv                      [3, 64, 6, 2, 2]              
  1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  2                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  4                -1  6   1118208  models.common.C3                        [256, 256, 6]                 
  5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  6                -1  9   6433792  models.common.C3                        [512, 512, 9]                 
  7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
  8                -1  3   9971712  models.common.C3                        [1024, 1024, 3]               
  9                -1  1   2624512  models.common.SPPF                      [1024, 1024, 5]               
 10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  3   2757632  models.common.C3                        [1024, 512, 3, False]         
 14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  3    690688  models.common.C3                        [512, 256, 3, False]          
 18                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  3   2495488  models.common.C3                        [512, 512, 3, False]          
 21                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  3   9971712  models.common.C3                        [1024, 1024, 3, False]        
 24      [17, 20, 23]  1   1109310  models.yolo.Detect                      [201, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
Model summary: 368 layers, 47215294 parameters, 47215294 gradients, 111.7 GFLOPs

Transferred 607/613 items from yolov5l.pt
AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 101 weight(decay=0.0), 104 weight(decay=0.0005), 104 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning /content/dataset/labels/train... 26258 images, 0 backgrounds, 0 corrupt:  49% 26258/53739 [1:31:03<1:58:09,  3.88it/s]
glenn-jocher commented 1 year ago

@Khalilolgod thanks for bringing up this issue. The scanning speed during the training process can vary depending on several factors such as the size of the images, the speed of the storage medium, and the machine's performance.

In your case, it seems that the custom dataset on the local drive in Colab is experiencing a slower scanning speed of around 4-5 iterations per second. The initial assumption of image resizing affecting the scanning speed might not be the primary reason, as the COCO dataset also goes through a similar process but at a much faster rate.

However, it's important to note that scanning speed is affected by various factors like the size of the dataset and the performance of the machine. Since Colab relies on network requests to access files in mounted drives or buckets, it can result in slower scanning speeds, especially when dealing with a large number of small image files.

One best practice to improve scanning performance is to transfer your dataset to a local medium before training, rather than relying on mounted drives or buckets. This helps avoid the network requests for each file access, thereby significantly improving scanning speed, particularly when dealing with a large number of small files.

Based on your feedback, it seems that downloading the dataset from RoboFlow and forking the repo directly led to a much-improved scanning speed, taking less than a minute for 25k images.

I hope this explanation clarifies the reason behind the scanning speed differences and provides you with some insights on how to optimize the process. If you have any further questions, please feel free to ask.

Ares-cz commented 7 months ago

I have upload the dataset on google drive, while when I training the YOLO model on colab, the scanning spped is too slow, and always at 2% progress. Before this training, I was using a smaller dataset to train the same model, the scanning is very fast. I'm not sure if this is related to the fact that I used a larger data set, but even so, why does the scan progress always stay at 2%? Thanks in advance.

image
glenn-jocher commented 7 months ago

@Ares-cz the slow scanning speed and the progress stalling at 2% could be due to a few reasons:

  1. Large Dataset: A larger dataset will naturally take longer to scan, especially if the images are high-resolution or if there are many annotations per image.
  2. Google Drive Latency: Accessing data directly from Google Drive can introduce significant latency due to network overhead. It's much faster to copy the dataset to the Colab VM's local storage before starting the training process.
  3. Corrupt Files: There might be corrupt or unreadable files in your dataset that could cause the scanning process to hang. Ensure all files are accessible and in the correct format.

To address this issue, try the following steps:

Remember, training on Colab with large datasets directly from Google Drive is not recommended due to the potential for slow I/O speeds. Always aim to use the local filesystem of the environment where you're training your models.