ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.45k stars 16.28k forks source link

Slow dataset scanning #11863

Closed Khalilolgod closed 1 year ago

Khalilolgod commented 1 year ago

Search before asking

Question

hey. i'm having issues with the speed of the dataset scanning on colab. i have dataset of ~54K training images and their scanning rate is around 4-5it/s. and as you'd guess, it would take a very long time just to scan the images. i saw the previous related issues and it was suggested to have the dataset in a local storage, which in my case , it is. is there anything i can do to speed up this process? it supposed to be ~500 times faster.

some info on the dataset: image

and a sample label (which was transformed form the coco format to yolo format):

92 0.493981 0.510504 0.114842 0.263596

and finally the training process.

train: weights=yolov5l.pt, cfg=, data=/content/yolov5/data1.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
        github: up to date with https://github.com/ultralytics/yolov5 ✅
        YOLOv5 🚀 v7.0-193-g485da42 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)

        hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
        Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
        TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
        Overriding model.yaml nc=80 with nc=201

                         from  n    params  module                                  arguments                     
          0                -1  1      7040  models.common.Conv                      [3, 64, 6, 2, 2]              
          1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
          2                -1  3    156928  models.common.C3                        [128, 128, 3]                 
          3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
          4                -1  6   1118208  models.common.C3                        [256, 256, 6]                 
          5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
          6                -1  9   6433792  models.common.C3                        [512, 512, 9]                 
          7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
          8                -1  3   9971712  models.common.C3                        [1024, 1024, 3]               
          9                -1  1   2624512  models.common.SPPF                      [1024, 1024, 5]               
         10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
         11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
         12           [-1, 6]  1         0  models.common.Concat                    [1]                           
         13                -1  3   2757632  models.common.C3                        [1024, 512, 3, False]         
         14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
         15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
         16           [-1, 4]  1         0  models.common.Concat                    [1]                           
         17                -1  3    690688  models.common.C3                        [512, 256, 3, False]          
         18                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
         19          [-1, 14]  1         0  models.common.Concat                    [1]                           
         20                -1  3   2495488  models.common.C3                        [512, 512, 3, False]          
         21                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
         22          [-1, 10]  1         0  models.common.Concat                    [1]                           
         23                -1  3   9971712  models.common.C3                        [1024, 1024, 3, False]        
         24      [17, 20, 23]  1   1109310  models.yolo.Detect                      [201, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
        Model summary: 368 layers, 47215294 parameters, 47215294 gradients, 111.7 GFLOPs

        Transferred 607/613 items from yolov5l.pt
        mAMP: checks passed ✅
        optimizer: SGD(lr=0.01) with parameter groups 101 weight(decay=0.0), 104 weight(decay=0.0005), 104 bias
        albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
        train: Scanning /content/dataset/labels/train... 16602 images, 0 backgrounds, 0 corrupt:  31% 16602/53739 [58:45<2:02:49,  5.04it/s]

Additional

No response

github-actions[bot] commented 1 year ago

👋 Hello @Khalilolgod, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 year ago

@Khalilolgod hi,

Thank you for reaching out. I understand that you are experiencing slow dataset scanning on Colab, even though your dataset is stored locally. Scanning ~54K training images at a rate of 4-5 iterations per second is taking a long time.

To speed up the dataset scanning process, you can try the following steps:

  1. Ensure that your dataset is properly organized and structured. Make sure that the images and their corresponding labels are in the expected format.

  2. Check the hardware resources available to Colab. Sometimes limited resources can impact the scanning speed. Consider using a GPU runtime with higher specifications for faster processing.

  3. Verify that the dataset path you provided is correct and accessible by the program. Any issues with the dataset path can slow down the scanning process.

I hope these suggestions help in improving the scanning speed of your dataset. If you have any further questions or concerns, feel free to ask.

Let me know if there's anything else I can assist you with.

Khalilolgod commented 1 year ago

thanks for the fast response. i will address each suggestion accordingly.

as for the first suggestion, i did check them and here are a couple of results showing the labels are fine: image image

  1. better gpus are expensive and we don't have much budget for it :') . (also there is no guarantee we would get any improvement at the scanning phase)

  2. the dataset path was ok too. (also if it weren't, wouldn't it raise an exception?)

could it be that the size of the pictures are too big and need to be down-sampled ?

glenn-jocher commented 1 year ago

@Khalilolgod it's great to hear that you've checked the labels and they seem to be fine.

Regarding the suggestion about hardware resources, I understand that budget constraints can limit your options. However, upgrading to a GPU with better specifications can potentially speed up the scanning phase. Keep in mind that the impact may vary depending on the specific dataset and computer configurations.

As for the dataset path, you're correct that an exception would be raised if there were any issues. Since you haven't encountered any exceptions, it suggests that the path is indeed correct.

Considering the size of the pictures, it's possible that larger images are causing the slow scanning process. Downsampling the images to a smaller size could potentially improve the speed. You can try resizing the images to a lower resolution and see if it makes a difference.

I hope these suggestions help. Let me know if you have any further questions or concerns.

Khalilolgod commented 1 year ago

With all due respect sir, You sound a lot like chatgpt XD. Doesn't yolo resize the pictures itself ? I'm guessing if I do resize them, the result won't be any different. I might be wrong...

glenn-jocher commented 1 year ago

@Khalilolgod hi,

Thank you for your response. I apologize if my previous message resembled a chatbot response. The intention was to provide helpful suggestions for improving the dataset scanning speed in YOLOv5.

In YOLOv5, the input image size affects the training process. By default, YOLOv5 does not resize the images automatically during training. However, during testing or inference, you can specify the --img-size parameter to resize the images to a specific size. Resizing the images to a smaller size during testing/inference can help improve the detection speed.

In your case, if the dataset images are already at a lower resolution, resizing them further might not yield significant improvement in the scanning speed. However, it's worth trying to downsample the images and assess if there is any noticeable difference in the scanning time.

Please let me know if there's anything else I can assist you with.

Khalilolgod commented 1 year ago

thanks that helped: i tried making them 16 times smaller and got the following result:

train: weights=yolov5l.pt, cfg=, data=/content/yolov5/data1.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-193-g485da42 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Downloading https://ultralytics.com/assets/Arial.ttf to /root/.config/Ultralytics/Arial.ttf...
100% 755k/755k [00:00<00:00, 47.0MB/s]
Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5l.pt to yolov5l.pt...
100% 89.3M/89.3M [00:02<00:00, 40.0MB/s]

Overriding model.yaml nc=80 with nc=201

                 from  n    params  module                                  arguments                     
  0                -1  1      7040  models.common.Conv                      [3, 64, 6, 2, 2]              
  1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  2                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  4                -1  6   1118208  models.common.C3                        [256, 256, 6]                 
  5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  6                -1  9   6433792  models.common.C3                        [512, 512, 9]                 
  7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
  8                -1  3   9971712  models.common.C3                        [1024, 1024, 3]               
  9                -1  1   2624512  models.common.SPPF                      [1024, 1024, 5]               
 10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  3   2757632  models.common.C3                        [1024, 512, 3, False]         
 14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  3    690688  models.common.C3                        [512, 256, 3, False]          
 18                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  3   2495488  models.common.C3                        [512, 512, 3, False]          
 21                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  3   9971712  models.common.C3                        [1024, 1024, 3, False]        
 24      [17, 20, 23]  1   1109310  models.yolo.Detect                      [201, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
Model summary: 368 layers, 47215294 parameters, 47215294 gradients, 111.7 GFLOPs

Transferred 607/613 items from yolov5l.pt
AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 101 weight(decay=0.0), 104 weight(decay=0.0005), 104 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning /content/dataset/labels/train... 53739 images, 0 backgrounds, 0 corrupt: 100% 53739/53739 [02:26<00:00, 366.52it/s]

and now the scanning process is ~70 times faster but still way far from the speed we get for coco dataset which was 1709.36it/s. But I'm fine with 2-3 minutes of scanning.

glenn-jocher commented 1 year ago

@Khalilolgod that's great to hear that downsampling the images has improved the scanning process for you! It's impressive that the scanning time has decreased by approximately 70 times. Although it's not at the same speed as the COCO dataset, the scanning process now takes around 2-3 minutes, which is more manageable for your needs.

If you have any further questions or need assistance with anything else, feel free to ask.