problem with last epoch

omumbare7 commented 1 year ago

Search before asking

[X] I have searched the HUB issues and discussions and found no similar questions.

Question

i have the following issue do i need to train again? (google colab)

Ultralytics HUB: New authentication successful ✅ Ultralytics HUB: View model at https://hub.ultralytics.com/models/ImtYCwsAjJZEpPbXEqMN 🚀 Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/models/ImtYCwsAjJZEpPbXEqMN/epoch-99.pt to 'epoch-99.pt'... 100%|██████████| 521M/521M [00:20<00:00, 26.1MB/s] WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'. Ultralytics YOLOv8.0.192 🚀 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB) engine/trainer: task=detect, mode=train, model=epoch-99.pt, data=https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/datasets/aASlsotSPCagZsSYYxMO/ugv.v1i.yolov8.zip, epochs=100, patience=100, batch=9, imgsz=640, save=True, save_period=-1, cache=ram, device=, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, stream_buffer=False, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.0, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=0.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/datasets/aASlsotSPCagZsSYYxMO/ugv.v1i.yolov8.zip to 'ugv.v1i.yolov8.zip'... 100%|██████████| 594M/594M [00:25<00:00, 24.3MB/s] Unzipping ugv.v1i.yolov8.zip to /content/datasets/ugv.v1i.yolov8...: 100%|██████████| 24272/24272 [00:05<00:00, 4407.52file/s] Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'... 100%|██████████| 755k/755k [00:00<00:00, 14.2MB/s] TensorBoard: Start with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/

               from  n    params  module                                       arguments

0 -1 1 2320 ultralytics.nn.modules.conv.Conv [3, 80, 3, 2]
1 -1 1 115520 ultralytics.nn.modules.conv.Conv [80, 160, 3, 2]
2 -1 3 436800 ultralytics.nn.modules.block.C2f [160, 160, 3, True]
3 -1 1 461440 ultralytics.nn.modules.conv.Conv [160, 320, 3, 2]
4 -1 6 3281920 ultralytics.nn.modules.block.C2f [320, 320, 6, True]
5 -1 1 1844480 ultralytics.nn.modules.conv.Conv [320, 640, 3, 2]
6 -1 6 13117440 ultralytics.nn.modules.block.C2f [640, 640, 6, True]
7 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
8 -1 3 6969600 ultralytics.nn.modules.block.C2f [640, 640, 3, True]
9 -1 1 1025920 ultralytics.nn.modules.block.SPPF [640, 640, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 1948800 ultralytics.nn.modules.block.C2f [960, 320, 3]
16 -1 1 922240 ultralytics.nn.modules.conv.Conv [320, 320, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 3 7174400 ultralytics.nn.modules.block.C2f [960, 640, 3]
19 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
22 [15, 18, 21] 1 8718931 ultralytics.nn.modules.head.Detect [1, [320, 640, 640]]
Model summary: 365 layers, 68153571 parameters, 68153555 gradients, 258.1 GFLOPs

Transferred 595/595 items from pretrained weights Freezing layer 'model.22.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n... Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'... 100%|██████████| 6.23M/6.23M [00:00<00:00, 75.8MB/s] AMP: checks passed ✅ train: Scanning /content/datasets/ugv.v1i.yolov8/train/labels... 10623 images, 21 backgrounds, 0 corrupt: 100%|██████████| 10623/10623 [00:05<00:00, 1849.00it/s] train: New cache created: /content/datasets/ugv.v1i.yolov8/train/labels.cache train: 18.2GB RAM required to cache images with 50% safety margin but only 7.8/12.7GB available, not caching images ⚠️ albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) val: Scanning /content/datasets/ugv.v1i.yolov8/valid/labels... 1016 images, 1 backgrounds, 0 corrupt: 100%|██████████| 1016/1016 [00:00<00:00, 1581.32it/s] val: New cache created: /content/datasets/ugv.v1i.yolov8/valid/labels.cache val: Caching images (1.2GB ram): 100%|██████████| 1016/1016 [00:05<00:00, 169.86it/s] Plotting labels to runs/detect/train/labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.0004921875), 103 bias(decay=0.0) Resuming training from epoch-99.pt from epoch 101 to 100 total epochs Closing dataloader mosaic albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) Ultralytics HUB: View model at https://hub.ultralytics.com/models/ImtYCwsAjJZEpPbXEqMN 🚀 Image sizes 640 train, 640 val Using 2 dataloader workers Logging results to runs/detect/train Starting training for 100 epochs...

1 epochs completed in 0.001 hours.

AssertionError Traceback (most recent call last) in <cell line: 4>() 2 3 model = YOLO('https://hub.ultralytics.com/models/ImtYCwsAjJZEpPbXEqMN') ----> 4 model.train()

5 frames /usr/local/lib/python3.10/dist-packages/ultralytics/utils/plotting.py in plot_results(file, dir, segment, pose, classify, on_plot) 534 ax = ax.ravel() 535 files = list(save_dir.glob('results*.csv')) --> 536 assert len(files), f'No results.csv files found in {save_dir.resolve()}, nothing to plot.' 537 for f in files: 538 try:

AssertionError: No results.csv files found in /content/runs/detect/train, nothing to plot.

Additional

i have 99 epoch.pt in content at colab, should i use it like regular .pt file?

kalenmike commented 1 year ago

@omumbare7 When training finishes YOLOv8 will strip the optimizer from the weights which can dramatically improve the file size. If you saw this crash before that happened you would have redundant information in your model which would be increasing its file size.

If you want to use your current best.pt file then you can attempt to strip the optimizer yourself. I have not tested this with YOLOv8:

from utils.torch_utils import strip_optimizer
strip_optimizer('path/to/best.pt')

It looks like you trained for 99 epochs and then you resumed for 1 epoch is that correct?

omumbare7 commented 1 year ago

@omumbare7 When training finishes YOLOv8 will strip the optimizer from the weights which can dramatically improve the file size. If you saw this crash before that happened you would have redundant information in your model which would be increasing its file size.

If you want to use your current best.pt file then you can attempt to strip the optimizer yourself. I have not tested this with YOLOv8:
from utils.torch_utils import strip_optimizer
strip_optimizer('path/to/best.pt')
It looks like you trained for 99 epochs and then you resumed for 1 epoch is that correct?

i trained for 100 epochs, then after 99th it gave this error. i am training using google colab from ultralytics hub, i will surely try it and let you know what will happen

this is what happened, i think its my issue (im a beginner) ModuleNotFoundError: No module named 'utils.torch_utils' (yes i have installed torch_utils)

UltralyticsAssistant commented 1 year ago

@omumbare7 it appears you are having a module import error which is causing the 'utils.torch_utils' module to not be found. This could be due to a couple of reasons such as your Python environment not being set up correctly, or the module does not exist in your current directory.

The 'utils.torch_utils' module is a part of the YOLOv3 library. Make sure that you have this library correctly installed and that you're running your script from the correct directory. If the yolov3 directory is not in your PYTHONPATH, Python will not be able to find the torch_utils module.

Additionally, you mentioned that you've installed 'torch_utils'. Please note that 'torch_utils' and 'utils.torch_utils' might not be the same thing.

In your case, the 'utils.torch_utils' is likely a module within the Yolov3 project, rather than a standalone package. If you're still experiencing issues, I would recommend reviewing your installation of the Yolov3 library to make sure it's correctly set up.

kalenmike commented 1 year ago

@omumbare7 Your model should be fixed now.

omumbare7 commented 1 year ago

@omumbare7 Your model should be fixed now.

yes it is fixed, i also found a way to strip the optimizer in yolov8 and it worked for me, it took the size from 930mb to 130mb the command i used was

from ultralytics.yolo.utils.torch_utils import strip_optimizer strip_optimizer("epoch-99.pt")

thank you very much @kalenmike

UltralyticsAssistant commented 1 year ago

@omumbare7 I'm glad to hear that your issue is resolved and you were able to successfully strip the optimizer from your model, thus reducing its size. Great job in finding an effective solution for YOLOv8! Your experience will likely be helpful to others encountering the same issue. Don't hesitate to reach out if you have any more questions - we're here to help. Thank you for using Ultralytics YOLO.

ultralytics / hub