Closed GiaZ90 closed 2 years ago
Looks like is working a bit more now....i've change the memory allocation to ALL disk and it starts to compute.
@GiaZ90 Can you explain your solution in more detail? I'm facing the same issue. Thanks.
write --workers 1 as an argument when you're running trian.py and then also try with batch size 8 instead of 16 by default.
For me, it starts to compute after I reduced the batch size. I encountered two different error message:
ImportError: DLL load failed: The paging file is too small for this operation to complete.
and
OSError: [WinError 1455] The paging file is too small for this operation to complete.
but essentially both same in nature.
@GiaZ90 or anyone else, please explain what is meant by 'changing memory allocation to all disk'. How does one do that?
@anay-p The guide is as following: [(https://www.thewindowsclub.com/increase-page-file-size-virtual-memory-windows)]
Search before asking
Question
I'm running a yolov5 neural network using a coco128 dataset and i have this specs i7 4770k 16gb ram gtx 1080 8gb
python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt
i'm facing the common issue of the small paging file so i searched and did this change to the source code fromnw = max(round(hyp['warmup_epochs'] * nb), 100) # number of warmup iterations, max(3 epochs, 100 iterations)
to nw = 1 lowering the number of workers,and it is still having the same issue. Then i also tryed to change the page allocation setting "dimension set by the system" below is my last try to run it p.s. i'm facing this issue after setting all to work with gpu to speed up the operations (with only the cpu it worked fine but it is so slow....)PS C:\Users\AdminZ\Desktop\Tirocinio\Yolo\yolov5> python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt wandb: Currently logged in as: giateam (use
wandb login --relogin` to force relogin) train: weights=yolov5s.pt, cfg=, data=coco128.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=3, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest "git" non è riconosciuto come comando interno o esterno, un programma eseguibile o un file batch. Command 'git fetch && git config --get remote.origin.url' returned non-zero exit status 1. "git" non è riconosciuto come comando interno o esterno, un programma eseguibile o un file batch. YOLOv5 2022-4-22 torch 1.11.0+cu113 CUDA:0 (NVIDIA GeForce GTX 1080, 8192MiB)hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/ wandb: Tracking run with wandb version 0.12.15 wandb: Run data is saved locally in C:\Users\AdminZ\Desktop\Tirocinio\Yolo\yolov5\wandb\run-20220501_113706-11msfiny wandb: Run
wandb offline
to turn off syncing. wandb: Syncing run earthy-bird-19 wandb: View project at https://wandb.ai/giateam/train wandb: View run at https://wandb.ai/giateam/train/runs/11msfiny YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected.0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2] 1 -1 1 18560 models.common.Conv [32, 64, 3, 2] 2 -1 1 18816 models.common.C3 [64, 64, 1] 3 -1 1 73984 models.common.Conv [64, 128, 3, 2] 4 -1 2 115712 models.common.C3 [128, 128, 2] 5 -1 1 295424 models.common.Conv [128, 256, 3, 2] 6 -1 3 625152 models.common.C3 [256, 256, 3] 7 -1 1 1180672 models.common.Conv [256, 512, 3, 2] 8 -1 1 1182720 models.common.C3 [512, 512, 1] 9 -1 1 656896 models.common.SPPF [512, 512, 5] 10 -1 1 131584 models.common.Conv [512, 256, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 1 361984 models.common.C3 [512, 256, 1, False] 14 -1 1 33024 models.common.Conv [256, 128, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 1 90880 models.common.C3 [256, 128, 1, False] 18 -1 1 147712 models.common.Conv [128, 128, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 1 296448 models.common.C3 [256, 256, 1, False] 21 -1 1 590336 models.common.Conv [256, 256, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 1182720 models.common.C3 [512, 512, 1, False] 24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]] Model summary: 270 layers, 7235389 parameters, 7235389 gradients
Transferred 349/349 items from yolov5s.pt Scaled weight_decay = 0.0005 optimizer: SGD with parameter groups 57 weight (no decay), 60 weight, 60 bias train: Scanning 'C:\Users\AdminZ\Desktop\Tirocinio\Yolo\datasets\coco128\labels\train2017.cache' images and labels... 1 wandb: Currently logged in as: giateam (use", line 1, in
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\runpy.py", line 269, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\AdminZ\Desktop\Tirocinio\Yolo\yolov5\train.py", line 26, in
import torch
File "C:\Users\AdminZ\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\ init__.py", line 126, in
raise err
OSError: [WinError 1455] Il file di paging è troppo piccolo per essere completato. Error loading "C:\Users\AdminZ\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.`
wandb login --relogin
to force relogin) wandb: Currently logged in as: giateam (usewandb login --relogin
to force relogin) wandb: Currently logged in as: giateam (usewandb login --relogin
to force relogin) wandb: Currently logged in as: giateam (usewandb login --relogin
to force relogin) wandb: Currently logged in as: giateam (usewandb login --relogin
to force relogin) wandb: Currently logged in as: giateam (usewandb login --relogin
to force relogin) wandb: Currently logged in as: giateam (usewandb login --relogin
to force relogin) wandb: Currently logged in as: giateam (usewandb login --relogin
to force relogin) val: Scanning 'C:\Users\AdminZ\Desktop\Tirocinio\Yolo\datasets\coco128\labels\train2017.cache' images and labels... 128 wandb: Currently logged in as: giateam (usewandb login --relogin
to force relogin) Traceback (most recent call last): File "Additional
No response