Closed omumbare7 closed 11 months ago
@omumbare7 I am interested to understand why you are getting stuck resuming on completion? Did your previous training fail to upload the final weights?
Can you share the code in trial.py
?
@omumbare7 When I did the last fix I wrote this notebook, it should offer a fix for now, but we haven't tested this across all models:
@omumbare7 I would like to resolve this permanently, maybe we can work together on this? You can reach me at:
kalen.michael@ultralytics.com
@omumbare7 I am interested to understand why you are getting stuck resuming on completion? Did your previous training fail to upload the final weights?
Can you share the code in
trial.py
?
i actually trained it on colab, so i dont think i have the access to trial.py
@omumbare7 When I did the last fix I wrote this notebook, it should offer a fix for now, but we haven't tested this across all models:
i will try this and let you know if the model works after this notebook, but it will definitely works with the strip optimizer i did previously
@omumbare7 I would like to resolve this permanently, maybe we can work together on this? You can reach me at:
i would like to contribute to this but i don't have the knowledge or the expertise in this topic, i am just a student and a beginner in this domain and i have just started to learn, my apologies
@omumbare7 that's perfectly okay! I completely understand. Learning is a continuous process and we all start somewhere. Your curiosity and willingness to explore is a great start in this domain. If you have any further questions or issues, don't hesitate to ask. We're here to help. Happy coding and learning!
š Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO š and Vision AI ā
Search before asking
HUB Component
No response
Bug
i already had this issue with one of the models i trained recently, it trained epoch 99 (out of 100) and then it gave out the following error
Ultralytics HUB: New authentication successful ā Ultralytics HUB: View model at https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM š Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/models/58IFoVk7ISnpulKrxrQM/epoch-99.pt to 'epoch-99.pt'... 100%|āāāāāāāāāā| 521M/521M [00:27<00:00, 20.2MB/s] WARNING ā ļø Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'. Ultralytics YOLOv8.0.194 š Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB) engine/trainer: task=detect, mode=train, model=epoch-99.pt, data=https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/datasets/sM0ItvxDPp9ahuaNVVRP/weed.v1i.yolov8.zip, epochs=100, patience=100, batch=9, imgsz=640, save=True, save_period=-1, cache=ram, device=, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, stream_buffer=False, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.0, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=0.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/datasets/sM0ItvxDPp9ahuaNVVRP/weed.v1i.yolov8.zip to 'weed.v1i.yolov8.zip'... 100%|āāāāāāāāāā| 1.64G/1.64G [01:17<00:00, 22.7MB/s] Unzipping weed.v1i.yolov8.zip to /content/datasets/weed.v1i.yolov8...: 100%|āāāāāāāāāā| 35660/35660 [00:13<00:00, 2705.23file/s] Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'... 100%|āāāāāāāāāā| 755k/755k [00:00<00:00, 14.4MB/s] TensorBoard: Start with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/
0 -1 1 2320 ultralytics.nn.modules.conv.Conv [3, 80, 3, 2]
1 -1 1 115520 ultralytics.nn.modules.conv.Conv [80, 160, 3, 2]
2 -1 3 436800 ultralytics.nn.modules.block.C2f [160, 160, 3, True]
3 -1 1 461440 ultralytics.nn.modules.conv.Conv [160, 320, 3, 2]
4 -1 6 3281920 ultralytics.nn.modules.block.C2f [320, 320, 6, True]
5 -1 1 1844480 ultralytics.nn.modules.conv.Conv [320, 640, 3, 2]
6 -1 6 13117440 ultralytics.nn.modules.block.C2f [640, 640, 6, True]
7 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
8 -1 3 6969600 ultralytics.nn.modules.block.C2f [640, 640, 3, True]
9 -1 1 1025920 ultralytics.nn.modules.block.SPPF [640, 640, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 1948800 ultralytics.nn.modules.block.C2f [960, 320, 3]
16 -1 1 922240 ultralytics.nn.modules.conv.Conv [320, 320, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 3 7174400 ultralytics.nn.modules.block.C2f [960, 640, 3]
19 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
22 [15, 18, 21] 1 8718931 ultralytics.nn.modules.head.Detect [1, [320, 640, 640]]
Model summary: 365 layers, 68153571 parameters, 68153555 gradients, 258.1 GFLOPs
Transferred 595/595 items from pretrained weights Freezing layer 'model.22.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n... Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'... 100%|āāāāāāāāāā| 6.23M/6.23M [00:00<00:00, 76.9MB/s] AMP: checks passed ā train: Scanning /content/datasets/weed.v1i.yolov8/train/labels... 15585 images, 1364 backgrounds, 0 corrupt: 100%|āāāāāāāāāā| 15585/15585 [00:08<00:00, 1920.07it/s] train: New cache created: /content/datasets/weed.v1i.yolov8/train/labels.cache train: 26.8GB RAM required to cache images with 50% safety margin but only 7.8/12.7GB available, not caching images ā ļø albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) val: Scanning /content/datasets/weed.v1i.yolov8/valid/labels... 1483 images, 127 backgrounds, 0 corrupt: 100%|āāāāāāāāāā| 1483/1483 [00:01<00:00, 938.41it/s] val: New cache created: /content/datasets/weed.v1i.yolov8/valid/labels.cache val: Caching images (1.7GB ram): 100%|āāāāāāāāāā| 1483/1483 [00:08<00:00, 169.14it/s] Plotting labels to runs/detect/train/labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.0004921875), 103 bias(decay=0.0) Resuming training from epoch-99.pt from epoch 101 to 100 total epochs Closing dataloader mosaic albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) Ultralytics HUB: View model at https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM š Image sizes 640 train, 640 val Using 2 dataloader workers Logging results to runs/detect/train Starting training for 100 epochs...
1 epochs completed in 0.001 hours.
AssertionError Traceback (most recent call last) in <cell line: 4>()
2
3 model = YOLO('https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM')
----> 4 model.train()
5 frames /usr/local/lib/python3.10/dist-packages/ultralytics/utils/plotting.py in plot_results(file, dir, segment, pose, classify, on_plot) 534 ax = ax.ravel() 535 files = list(save_dir.glob('results*.csv')) --> 536 assert len(files), f'No results.csv files found in {save_dir.resolve()}, nothing to plot.' 537 for f in files: 538 try:
AssertionError: No results.csv files found in /content/runs/detect/train, nothing to plot.
then i striped the optimizer myself by downloading epoch-99 and it worked, i am writing this here as i have faced this error in other model i trained as well, maybe it is a bug at the moment, i am reporting this bug here as it can be issue for other model trainings as well. i striped the optimizer for it as well also @kalenmike fixed the previous for me via ultralytics hub, i was able to download it from there, but the model i downloaded doesnt work and it gives out errors like
Confidence ---> 0.85 Traceback (most recent call last): File "c:\Users\om\Desktop\ugv proto\codes\yolo\yolov8\trial.py", line 77, in
print("Class name -->", classNames[cls])
IndexError: list index out of range
Exception ignored in: <generator object BasePredictor.stream_inference at 0x0000023FB8C19310>
Traceback (most recent call last):
File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 52, in generator_context
File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 300, in clone
File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 286, in init
AttributeError: 'NoneType' object has no attribute 'is_scripting
this error only occurs with the fixed model i downloaded form the hub, the size was 56mb and the same model before the error fixing (epoch 99) which i striped the optimizer from works really well without any errors (136mb) i am writing here to report a bug. i dont have any issues at the moment regarding models.
Environment
No response
Minimal Reproducible Example
No response
Additional
No response