ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.69k stars 16.33k forks source link

OSError: [WinError 1455] paging file too small #7657

Closed GiaZ90 closed 2 years ago

GiaZ90 commented 2 years ago

Search before asking

Question

I'm running a yolov5 neural network using a coco128 dataset and i have this specs i7 4770k 16gb ram gtx 1080 8gb python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt i'm facing the common issue of the small paging file so i searched and did this change to the source code from nw = max(round(hyp['warmup_epochs'] * nb), 100) # number of warmup iterations, max(3 epochs, 100 iterations) to nw = 1 lowering the number of workers,and it is still having the same issue. Then i also tryed to change the page allocation setting "dimension set by the system" below is my last try to run it p.s. i'm facing this issue after setting all to work with gpu to speed up the operations (with only the cpu it worked fine but it is so slow....)

PS C:\Users\AdminZ\Desktop\Tirocinio\Yolo\yolov5> python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt wandb: Currently logged in as: giateam (usewandb login --relogin` to force relogin) train: weights=yolov5s.pt, cfg=, data=coco128.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=3, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest "git" non è riconosciuto come comando interno o esterno, un programma eseguibile o un file batch. Command 'git fetch && git config --get remote.origin.url' returned non-zero exit status 1. "git" non è riconosciuto come comando interno o esterno, un programma eseguibile o un file batch. YOLOv5 2022-4-22 torch 1.11.0+cu113 CUDA:0 (NVIDIA GeForce GTX 1080, 8192MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/ wandb: Tracking run with wandb version 0.12.15 wandb: Run data is saved locally in C:\Users\AdminZ\Desktop\Tirocinio\Yolo\yolov5\wandb\run-20220501_113706-11msfiny wandb: Run wandb offline to turn off syncing. wandb: Syncing run earthy-bird-19 wandb: View project at https://wandb.ai/giateam/train wandb: View run at https://wandb.ai/giateam/train/runs/11msfiny YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected.

             from  n    params  module                                  arguments

0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2] 1 -1 1 18560 models.common.Conv [32, 64, 3, 2] 2 -1 1 18816 models.common.C3 [64, 64, 1] 3 -1 1 73984 models.common.Conv [64, 128, 3, 2] 4 -1 2 115712 models.common.C3 [128, 128, 2] 5 -1 1 295424 models.common.Conv [128, 256, 3, 2] 6 -1 3 625152 models.common.C3 [256, 256, 3] 7 -1 1 1180672 models.common.Conv [256, 512, 3, 2] 8 -1 1 1182720 models.common.C3 [512, 512, 1] 9 -1 1 656896 models.common.SPPF [512, 512, 5] 10 -1 1 131584 models.common.Conv [512, 256, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 1 361984 models.common.C3 [512, 256, 1, False] 14 -1 1 33024 models.common.Conv [256, 128, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 1 90880 models.common.C3 [256, 128, 1, False] 18 -1 1 147712 models.common.Conv [128, 128, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 1 296448 models.common.C3 [256, 256, 1, False] 21 -1 1 590336 models.common.Conv [256, 256, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 1182720 models.common.C3 [512, 512, 1, False] 24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]] Model summary: 270 layers, 7235389 parameters, 7235389 gradients

Transferred 349/349 items from yolov5s.pt Scaled weight_decay = 0.0005 optimizer: SGD with parameter groups 57 weight (no decay), 60 weight, 60 bias train: Scanning 'C:\Users\AdminZ\Desktop\Tirocinio\Yolo\datasets\coco128\labels\train2017.cache' images and labels... 1 wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) val: Scanning 'C:\Users\AdminZ\Desktop\Tirocinio\Yolo\datasets\coco128\labels\train2017.cache' images and labels... 128 wandb: Currently logged in as: giateam (use wandb login --relogin to force relogin) Traceback (most recent call last): File "", line 1, in File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\runpy.py", line 269, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\AdminZ\Desktop\Tirocinio\Yolo\yolov5\train.py", line 26, in import torch File "C:\Users\AdminZ\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\init__.py", line 126, in raise err OSError: [WinError 1455] Il file di paging è troppo piccolo per essere completato. Error loading "C:\Users\AdminZ\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.`

Additional

No response

GiaZ90 commented 2 years ago

Looks like is working a bit more now....i've change the memory allocation to ALL disk and it starts to compute.

wmcnally commented 2 years ago

@GiaZ90 Can you explain your solution in more detail? I'm facing the same issue. Thanks.

ashmalvayani commented 2 years ago

write --workers 1 as an argument when you're running trian.py and then also try with batch size 8 instead of 16 by default.

yuenherny commented 1 year ago

For me, it starts to compute after I reduced the batch size. I encountered two different error message:

ImportError: DLL load failed: The paging file is too small for this operation to complete.

and

OSError: [WinError 1455] The paging file is too small for this operation to complete.

but essentially both same in nature.

anay-p commented 1 year ago

@GiaZ90 or anyone else, please explain what is meant by 'changing memory allocation to all disk'. How does one do that?

wrxhhh commented 1 year ago

@anay-p The guide is as following: [(https://www.thewindowsclub.com/increase-page-file-size-virtual-memory-windows)]