about train error - Githubissues

zcswdt commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

train: weights='yolov5l.pt', cfg=./models/yolov5l.yaml, data=./data/slide.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=200, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5 YOLOv5 2022-10-21 Python-3.8.13 torch-1.12.0+cu116 CUDA:0 (NVIDIA GeForce RTX 2070, 8192MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 in ClearML Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 runs in Comet TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/ Traceback (most recent call last): File "train.py", line 630, in main(opt) File "train.py", line 524, in main train(opt.hyp, opt, device, callbacks) File "train.py", line 116, in train check_suffix(weights, '.pt') # check weights File "F:\tongji\project\yolov5-master\utils\general.py", line 417, in check_suffix assert s in suffix, f"{msg}{f} acceptable suffix is {suffix}" AssertionError: 'yolov5l.pt' acceptable suffix is ['.pt']

(yolo_py38) F:\tongji\project\yolov5-master>python train.py --data ./data/slide.yaml --cfg ./models/yolov5l.yaml --weights yolov5l.pt --batch-size 16 train: weights=yolov5l.pt, cfg=./models/yolov5l.yaml, data=./data/slide.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=200, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5 YOLOv5 2022-10-21 Python-3.8.13 torch-1.12.0+cu116 CUDA:0 (NVIDIA GeForce RTX 2070, 8192MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 in ClearML Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 runs in Comet TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/

             from  n    params  module                                  arguments

0 -1 1 7040 models.common.Conv [3, 64, 6, 2, 2] 1 -1 1 73984 models.common.Conv [64, 128, 3, 2] 2 -1 3 156928 models.common.C3 [128, 128, 3] 3 -1 1 295424 models.common.Conv [128, 256, 3, 2] 4 -1 6 1118208 models.common.C3 [256, 256, 6] 5 -1 1 1180672 models.common.Conv [256, 512, 3, 2] 6 -1 9 6433792 models.common.C3 [512, 512, 9] 7 -1 1 4720640 models.common.Conv [512, 1024, 3, 2] 8 -1 3 9971712 models.common.C3 [1024, 1024, 3] 9 -1 1 2624512 models.common.SPPF [1024, 1024, 5] 10 -1 1 525312 models.common.Conv [1024, 512, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 3 2757632 models.common.C3 [1024, 512, 3, False] 14 -1 1 131584 models.common.Conv [512, 256, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 3 690688 models.common.C3 [512, 256, 3, False] 18 -1 1 590336 models.common.Conv [256, 256, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 3 2495488 models.common.C3 [512, 512, 3, False] 21 -1 1 2360320 models.common.Conv [512, 512, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 3 9971712 models.common.C3 [1024, 1024, 3, False] 24 [17, 20, 23] 1 32310 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]] YOLOv5l summary: 368 layers, 46138294 parameters, 46138294 gradients, 108.2 GFLOPs

Transferred 606/613 items from yolov5l.pt AMP: checks passed optimizer: SGD(lr=0.01) with parameter groups 101 weight(decay=0.0), 104 weight(decay=0.0005), 104 bias train: Scanning 'F:\tongji\project\yolov5-master\dataset\train\labels.cache' images and labels... 322 found, 0 missing, Traceback (most recent call last): File "", line 1, in File "D:\software\anaconda\exe\envs\yolo_py38\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "D:\software\anaconda\exe\envs\yolo_py38\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "D:\software\anaconda\exe\envs\yolo_py38\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\software\anaconda\exe\envs\yolo_py38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "D:\software\anaconda\exe\envs\yolo_py38\lib\runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "D:\software\anaconda\exe\envs\yolo_py38\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "D:\software\anaconda\exe\envs\yolo_py38\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "F:\tongji\project\yolov5-master\train.py", line 29, in import torch File "D:\software\anaconda\exe\envs\yolo_py38\lib\site-packages\torch__init__.py", line 129, in raise err OSError: [WinError 1455] 页面文件太小，无法完成操作。 Error loading "D:\software\anaconda\exe\envs\yolo_py38\lib\site-packages\torch\lib\cudnn_adv_train64_8.dll" or one of its dependencies.

Additional

No response

glenn-jocher commented 2 years ago

👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀. We've created a few short guidelines below to help users provide what we need in order to start investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

✅ Minimal – Use as little code as possible to produce the problem
✅ Complete – Provide all parts someone else needs to reproduce the problem
✅ Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

✅ Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
✅ Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

zcswdt commented 2 years ago

I trained the latest code in the win11 system. The environment was also installed according to the requirements of the warehouse. The following error was reported during the training. I don't know what the reason is. Please take a look. Thank you

OSError: [WinError 1455]页面文件太小，无法完成操作。错误加载“D:\software\anaconda\exe\envs\yolo_py38\lib\site-packages\torch\lib\cudnn_adv_train64_8.dll”或其依赖项之一。

github-actions[bot] commented 1 year ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

glenn-jocher commented 1 year ago

@zcswdt it looks like you're encountering an issue in the training process on a Windows 11 system. The error message "OSError: [WinError 1455] The paging file is too small for this operation." indicates a Windows system memory error related to CUDA and cuDNN. This is likely due to insufficient virtual memory.

To address this issue, you can try increasing the size of the page file in your Windows 11 system settings. Here's a general guide on increasing the page file size:

Search for "Advanced system settings" in your Windows search bar and open it.
In the System Properties window, go to the "Advanced" tab and click on "Settings" under the "Performance" section.
In the Performance Options window, go to the "Advanced" tab and click on "Change" under the "Virtual memory" section.
Uncheck the "Automatically manage paging file size for all drives" checkbox.
Select the drive on which you have installed the required environment, select "Custom size", and then set the initial size and maximum size of the page file.
Click "Set" and then "OK" to save the changes.

After adjusting the page file size, try running the YOLOv5 training again to see if the issue is resolved.

If you encounter any further issues, feel free to provide as much detail as possible in a new bug report, including the steps you followed and any changes you made to the environment or code.

Please let me know if this resolves the issue for you. Good luck!

ultralytics / yolov5

about train error #9885

Search before asking

Question

Additional

How to create a Minimal, Reproducible Example