ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
134 stars 13 forks source link

problems for strip optimizer at last epoch #428

Closed omumbare7 closed 11 months ago

omumbare7 commented 1 year ago

Search before asking

HUB Component

No response

Bug

i already had this issue with one of the models i trained recently, it trained epoch 99 (out of 100) and then it gave out the following error

Ultralytics HUB: New authentication successful āœ… Ultralytics HUB: View model at https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM šŸš€ Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/models/58IFoVk7ISnpulKrxrQM/epoch-99.pt to 'epoch-99.pt'... 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 521M/521M [00:27<00:00, 20.2MB/s] WARNING āš ļø Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'. Ultralytics YOLOv8.0.194 šŸš€ Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB) engine/trainer: task=detect, mode=train, model=epoch-99.pt, data=https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/datasets/sM0ItvxDPp9ahuaNVVRP/weed.v1i.yolov8.zip, epochs=100, patience=100, batch=9, imgsz=640, save=True, save_period=-1, cache=ram, device=, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, stream_buffer=False, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.0, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=0.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/datasets/sM0ItvxDPp9ahuaNVVRP/weed.v1i.yolov8.zip to 'weed.v1i.yolov8.zip'... 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 1.64G/1.64G [01:17<00:00, 22.7MB/s] Unzipping weed.v1i.yolov8.zip to /content/datasets/weed.v1i.yolov8...: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 35660/35660 [00:13<00:00, 2705.23file/s] Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'... 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 755k/755k [00:00<00:00, 14.4MB/s] TensorBoard: Start with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/

               from  n    params  module                                       arguments                     

0 -1 1 2320 ultralytics.nn.modules.conv.Conv [3, 80, 3, 2]
1 -1 1 115520 ultralytics.nn.modules.conv.Conv [80, 160, 3, 2]
2 -1 3 436800 ultralytics.nn.modules.block.C2f [160, 160, 3, True]
3 -1 1 461440 ultralytics.nn.modules.conv.Conv [160, 320, 3, 2]
4 -1 6 3281920 ultralytics.nn.modules.block.C2f [320, 320, 6, True]
5 -1 1 1844480 ultralytics.nn.modules.conv.Conv [320, 640, 3, 2]
6 -1 6 13117440 ultralytics.nn.modules.block.C2f [640, 640, 6, True]
7 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
8 -1 3 6969600 ultralytics.nn.modules.block.C2f [640, 640, 3, True]
9 -1 1 1025920 ultralytics.nn.modules.block.SPPF [640, 640, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 1948800 ultralytics.nn.modules.block.C2f [960, 320, 3]
16 -1 1 922240 ultralytics.nn.modules.conv.Conv [320, 320, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 3 7174400 ultralytics.nn.modules.block.C2f [960, 640, 3]
19 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
22 [15, 18, 21] 1 8718931 ultralytics.nn.modules.head.Detect [1, [320, 640, 640]]
Model summary: 365 layers, 68153571 parameters, 68153555 gradients, 258.1 GFLOPs

Transferred 595/595 items from pretrained weights Freezing layer 'model.22.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n... Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'... 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 6.23M/6.23M [00:00<00:00, 76.9MB/s] AMP: checks passed āœ… train: Scanning /content/datasets/weed.v1i.yolov8/train/labels... 15585 images, 1364 backgrounds, 0 corrupt: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 15585/15585 [00:08<00:00, 1920.07it/s] train: New cache created: /content/datasets/weed.v1i.yolov8/train/labels.cache train: 26.8GB RAM required to cache images with 50% safety margin but only 7.8/12.7GB available, not caching images āš ļø albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) val: Scanning /content/datasets/weed.v1i.yolov8/valid/labels... 1483 images, 127 backgrounds, 0 corrupt: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 1483/1483 [00:01<00:00, 938.41it/s] val: New cache created: /content/datasets/weed.v1i.yolov8/valid/labels.cache val: Caching images (1.7GB ram): 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 1483/1483 [00:08<00:00, 169.14it/s] Plotting labels to runs/detect/train/labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.0004921875), 103 bias(decay=0.0) Resuming training from epoch-99.pt from epoch 101 to 100 total epochs Closing dataloader mosaic albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) Ultralytics HUB: View model at https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM šŸš€ Image sizes 640 train, 640 val Using 2 dataloader workers Logging results to runs/detect/train Starting training for 100 epochs...

1 epochs completed in 0.001 hours.

AssertionError Traceback (most recent call last) in <cell line: 4>() 2 3 model = YOLO('https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM') ----> 4 model.train()

5 frames /usr/local/lib/python3.10/dist-packages/ultralytics/utils/plotting.py in plot_results(file, dir, segment, pose, classify, on_plot) 534 ax = ax.ravel() 535 files = list(save_dir.glob('results*.csv')) --> 536 assert len(files), f'No results.csv files found in {save_dir.resolve()}, nothing to plot.' 537 for f in files: 538 try:

AssertionError: No results.csv files found in /content/runs/detect/train, nothing to plot.

then i striped the optimizer myself by downloading epoch-99 and it worked, i am writing this here as i have faced this error in other model i trained as well, maybe it is a bug at the moment, i am reporting this bug here as it can be issue for other model trainings as well. i striped the optimizer for it as well also @kalenmike fixed the previous for me via ultralytics hub, i was able to download it from there, but the model i downloaded doesnt work and it gives out errors like

Confidence ---> 0.85 Traceback (most recent call last): File "c:\Users\om\Desktop\ugv proto\codes\yolo\yolov8\trial.py", line 77, in print("Class name -->", classNames[cls]) IndexError: list index out of range Exception ignored in: <generator object BasePredictor.stream_inference at 0x0000023FB8C19310> Traceback (most recent call last): File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 52, in generator_context File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 300, in clone File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 286, in init AttributeError: 'NoneType' object has no attribute 'is_scripting

this error only occurs with the fixed model i downloaded form the hub, the size was 56mb and the same model before the error fixing (epoch 99) which i striped the optimizer from works really well without any errors (136mb) i am writing here to report a bug. i dont have any issues at the moment regarding models.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

kalenmike commented 1 year ago

@omumbare7 I am interested to understand why you are getting stuck resuming on completion? Did your previous training fail to upload the final weights?

Can you share the code in trial.py?

kalenmike commented 1 year ago

@omumbare7 When I did the last fix I wrote this notebook, it should offer a fix for now, but we haven't tested this across all models:

Colab Notebook

kalenmike commented 1 year ago

@omumbare7 I would like to resolve this permanently, maybe we can work together on this? You can reach me at:

kalen.michael@ultralytics.com

omumbare7 commented 1 year ago

@omumbare7 I am interested to understand why you are getting stuck resuming on completion? Did your previous training fail to upload the final weights?

Can you share the code in trial.py?

i actually trained it on colab, so i dont think i have the access to trial.py

omumbare7 commented 1 year ago

@omumbare7 When I did the last fix I wrote this notebook, it should offer a fix for now, but we haven't tested this across all models:

Colab Notebook

i will try this and let you know if the model works after this notebook, but it will definitely works with the strip optimizer i did previously

omumbare7 commented 1 year ago

@omumbare7 I would like to resolve this permanently, maybe we can work together on this? You can reach me at:

kalen.michael@ultralytics.com

i would like to contribute to this but i don't have the knowledge or the expertise in this topic, i am just a student and a beginner in this domain and i have just started to learn, my apologies

UltralyticsAssistant commented 1 year ago

@omumbare7 that's perfectly okay! I completely understand. Learning is a continuous process and we all start somewhere. Your curiosity and willingness to explore is a great start in this domain. If you have any further questions or issues, don't hesitate to ask. We're here to help. Happy coding and learning!

github-actions[bot] commented 11 months ago

šŸ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO šŸš€ and Vision AI ā­