Open lsun21 opened 3 months ago
π Hello @lsun21, thank you for raising an issue about Ultralytics HUB π! Please visit our HUB Docs to learn more:
If this is a π Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.
If this is a β Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.
We try to respond to all issues as promptly as possible. Thank you for your patience!
@lsun21 hello,
Thank you for reaching out and providing detailed information about the issue you're encountering. It sounds like you're experiencing a problem with the model upload process getting stuck at 100% during the optimization of weights.
To help us better understand and resolve this issue, could you please try the following steps:
Update to the Latest Version: Ensure that you are using the latest versions of the Ultralytics packages. You can update them using the following commands:
pip install --upgrade ultralytics
Check Internet Connection: Sometimes, network issues can cause uploads to hang. Please verify that your internet connection is stable.
Retry the Upload: Occasionally, retrying the upload process can resolve temporary issues. You can do this by running the following command in your Colab notebook:
from ultralytics import YOLO
model = YOLO('path/to/your/model.pt')
model.upload()
Log Files: If the issue persists, please check the log files for any error messages or warnings that might provide more insight into what is going wrong. You can find the logs in the runs
directory of your project.
Alternative Upload Method: If the direct upload continues to fail, you can manually upload the model to the Ultralytics HUB by downloading the .pt
file from Colab and then uploading it through the HUB interface.
If you have tried all the above steps and the issue still persists, please let us know. Providing any additional error messages or logs would be very helpful for further troubleshooting.
Thank you for your patience and cooperation. We look forward to helping you resolve this issue!
Hi! Thanks for your promptness!
I followed your suggestions, upgraded ultralytics first, and tried to upload the model with the code above. There is an error on the line with model.upload(). Do you know how to fix it?
Thank you!!!
@lsun21 It looks like the final weights weren't uploaded... I suggest resuming training from the last checkpoint.
Thanks for your reply!
Which checkpoint do you suggest to resume here? It shows all the checkpoints (100/100) have been saved. Since I already ran an extra one (not sure if it's saved), it shows that the -1 epoch remains now....
Thanks for all of your help!
Hello @lsun21,
Thank you for your patience and for providing additional details. Given the situation, it seems like you have multiple checkpoints saved. To resume training from the last successful checkpoint, you can use the most recent one before the issue occurred.
Hereβs how you can do it:
Identify the Last Successful Checkpoint: Check the runs/train/exp/weights
directory (or the equivalent directory where your training results are saved) for the latest checkpoint file. These files are typically named last.pt
, best.pt
, or epoch_xx.pt
.
Resume Training: Use the identified checkpoint to resume training. Hereβs an example code snippet to help you resume training from a specific checkpoint:
from ultralytics import YOLO
# Load the model from the last successful checkpoint
model = YOLO('path/to/your/checkpoint.pt')
# Resume training
model.train(data='path/to/your/data.yaml', epochs=additional_epochs)
Upload the Model: After resuming and completing the additional epochs, try uploading the model again:
model.upload()
If you encounter any issues during this process, please provide any error messages or logs that appear. This will help us diagnose the problem more effectively.
Thank you for your cooperation, and I hope this helps resolve the issue! If you have any further questions, feel free to ask. π
Thanks for your response.
I am now stuck resuming the model. It shows that the model has been trained with 100 epochs, so I assumed this is the checkpoint I should restart with, and I defined epochs = 1 only for time saving.
But somehow, it starts to train another 100 epochs as default. I changed the argument, but it still did not work. How should I fix it?
Many thanks!
@lsun21 If you just use the command shown in the Ultralytics HUB UI to resume training (no extra arguments), does it work?
No, it still automatically starts training with another 100 epochs...
Hello @lsun21,
Thank you for your patience and for providing additional details. It seems like the training process is not respecting the specified number of epochs when resuming from a checkpoint. Let's try a more explicit approach to ensure the correct number of epochs is set.
Here's how you can explicitly set the number of epochs when resuming training:
Load the Model and Set the Number of Epochs:
from ultralytics import YOLO
# Load the model from the last successful checkpoint
model = YOLO('path/to/your/checkpoint.pt')
# Resume training with the specified number of epochs
model.train(data='path/to/your/data.yaml', epochs=1, resume=True)
Verify the Training Configuration: Ensure that the training configuration is correctly set to resume from the checkpoint and only run for the specified number of epochs.
If the issue persists, please make sure you are using the latest version of the Ultralytics package. You can update it using:
pip install --upgrade ultralytics
If you continue to experience difficulties, please provide any additional error messages or logs that appear. This will help us diagnose the problem more effectively.
Thank you for your cooperation, and I hope this helps resolve the issue! If you have any further questions, feel free to ask. π
Thanks for your continued input.
I tried to specify the epoch =1 (or run without argument @sergiuwaxmann suggested), but the model always shows "start training for 200 epochs", which I never defined before.....
Here is the full log: requirements: Ultralytics requirement ['hub-sdk>=0.0.8'] not found, attempting AutoUpdate... Collecting hub-sdk>=0.0.8 Downloading hub_sdk-0.0.8-py3-none-any.whl.metadata (10 kB) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from hub-sdk>=0.0.8) (2.32.3) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->hub-sdk>=0.0.8) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->hub-sdk>=0.0.8) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->hub-sdk>=0.0.8) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->hub-sdk>=0.0.8) (2024.7.4) Downloading hub_sdk-0.0.8-py3-none-any.whl (40 kB) ββββββββββββββββββββββββββββββββββββββββ 40.9/40.9 kB 4.9 MB/s eta 0:00:00 Installing collected packages: hub-sdk Successfully installed hub-sdk-0.0.8
requirements: AutoUpdate success β 3.3s, installed 1 package: ['hub-sdk>=0.0.8'] requirements: β οΈ Restart runtime or rerun command for updates to take effect
Ultralytics HUB: New authentication successful β Ultralytics HUB: View model at https://hub.ultralytics.com/models/cUDUDcKp7iarW2k2VNn8 π Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/uwwrWu3vbnOw9IfulmjBYyFmrUV2/models/cUDUDcKp7iarW2k2VNn8/epoch-100.pt to 'weights/epoch-100.pt'... 2024-08-10 19:08:00,453 - hub_sdk.helpers.logger - ERROR - Unknown error occurred. ERROR:hub_sdk.helpers.logger:Unknown error occurred. 2024-08-10 19:08:00,457 - hub_sdk.helpers.logger - ERROR - Failed to start heartbeats: 'NoneType' object has no attribute 'json' ERROR:hub_sdk.helpers.logger:Failed to start heartbeats: 'NoneType' object has no attribute 'json' Exception in thread Thread-10 (_start_heartbeats): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.10/dist-packages/hub_sdk/base/server_clients.py", line 151, in _start_heartbeats raise e File "/usr/local/lib/python3.10/dist-packages/hub_sdk/base/server_clients.py", line 139, in _start_heartbeats res = self.post(endpoint, json=payload).json() AttributeError: 'NoneType' object has no attribute 'json' 100%|ββββββββββ| 261M/261M [00:13<00:00, 20.9MB/s] WARNING β οΈ using HUB training arguments, ignoring local training arguments. Ultralytics YOLOv8.2.75 π Python-3.10.12 torch-2.3.1+cu121 CUDA:0 (Tesla T4, 15102MiB) engine/trainer: task=detect, mode=train, model=weights/epoch-100.pt, data=https://app.roboflow.com/ds/eL8DtSIPgc?key=YMzS4TZHn6, epochs=100, time=None, patience=100, batch=9, imgsz=640, save=True, save_period=-1, cache=None, device=[0], workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=weights/epoch-100.pt, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.0, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=0.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train Downloading https://app.roboflow.com/ds/eL8DtSIPgc to 'eL8DtSIPgc'... 100%|ββββββββββ| 580M/580M [00:12<00:00, 48.9MB/s] Unzipping eL8DtSIPgc to /content/datasets/eL8DtSIPgc...: 100%|ββββββββββ| 75540/75540 [00:12<00:00, 5948.42file/s] Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'... 100%|ββββββββββ| 755k/755k [00:00<00:00, 19.5MB/s] TensorBoard: Start with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/
from n params module arguments
0 -1 1 2320 ultralytics.nn.modules.conv.Conv [3, 80, 3, 2]
1 -1 1 115520 ultralytics.nn.modules.conv.Conv [80, 160, 3, 2]
2 -1 3 436800 ultralytics.nn.modules.block.C2f [160, 160, 3, True]
3 -1 1 461440 ultralytics.nn.modules.conv.Conv [160, 320, 3, 2]
4 -1 6 3281920 ultralytics.nn.modules.block.C2f [320, 320, 6, True]
5 -1 1 1844480 ultralytics.nn.modules.conv.Conv [320, 640, 3, 2]
6 -1 6 13117440 ultralytics.nn.modules.block.C2f [640, 640, 6, True]
7 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
8 -1 3 6969600 ultralytics.nn.modules.block.C2f [640, 640, 3, True]
9 -1 1 1025920 ultralytics.nn.modules.block.SPPF [640, 640, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 1948800 ultralytics.nn.modules.block.C2f [960, 320, 3]
16 -1 1 922240 ultralytics.nn.modules.conv.Conv [320, 320, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 3 7174400 ultralytics.nn.modules.block.C2f [960, 640, 3]
19 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
22 [15, 18, 21] 1 8726635 ultralytics.nn.modules.head.Detect [9, [320, 640, 640]]
Model summary: 365 layers, 68,161,275 parameters, 68,161,259 gradients, 258.2 GFLOPs
Transferred 595/595 items from pretrained weights Freezing layer 'model.22.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n... Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to 'yolov8n.pt'... 100%|ββββββββββ| 6.25M/6.25M [00:00<00:00, 109MB/s] AMP: checks passed β train: Scanning /content/datasets/eL8DtSIPgc/train/labels... 33048 images, 17997 backgrounds, 0 corrupt: 100%|ββββββββββ| 33048/33048 [00:12<00:00, 2646.43it/s] train: New cache created: /content/datasets/eL8DtSIPgc/train/labels.cache albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) /usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. self.pid = os.fork() val: Scanning /content/datasets/eL8DtSIPgc/valid/labels... 3141 images, 1750 backgrounds, 0 corrupt: 100%|ββββββββββ| 3141/3141 [00:01<00:00, 1967.34it/s] val: New cache created: /content/datasets/eL8DtSIPgc/valid/labels.cache Plotting labels to runs/detect/train/labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.0004921875), 103 bias(decay=0.0) Resuming training weights/epoch-100.pt from epoch 102 to 100 total epochs DetectionModel( (model): Sequential( (0): Conv( (conv): Conv2d(3, 80, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (1): Conv( (conv): Conv2d(80, 160, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (2): C2f( (cv1): Conv( (conv): Conv2d(160, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(400, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) ) (3): Conv( (conv): Conv2d(160, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (4): C2f( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1280, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-5): 6 x Bottleneck( (cv1): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) ) (5): Conv( (conv): Conv2d(320, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (6): C2f( (cv1): Conv( (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(2560, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-5): 6 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) ) (7): Conv( (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (8): C2f( (cv1): Conv( (conv): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) ) (9): SPPF( (cv1): Conv( (conv): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False) ) (10): Upsample(scale_factor=2.0, mode='nearest') (11): Concat() (12): C2f( (cv1): Conv( (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) ) (13): Upsample(scale_factor=2.0, mode='nearest') (14): Concat() (15): C2f( (cv1): Conv( (conv): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(800, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) ) (16): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (17): Concat() (18): C2f( (cv1): Conv( (conv): Conv2d(960, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) ) (19): Conv( (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (20): Concat() (21): C2f( (cv1): Conv( (conv): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(1600, 640, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(640, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (cv2): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) ) ) ) (22): Detect( (cv2): ModuleList( (0): Sequential( (0): Conv( (conv): Conv2d(320, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (1): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (2): Conv2d(80, 64, kernel_size=(1, 1), stride=(1, 1)) ) (1-2): 2 x Sequential( (0): Conv( (conv): Conv2d(640, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (1): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (2): Conv2d(80, 64, kernel_size=(1, 1), stride=(1, 1)) ) ) (cv3): ModuleList( (0): Sequential( (0): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (2): Conv2d(320, 9, kernel_size=(1, 1), stride=(1, 1)) ) (1-2): 2 x Sequential( (0): Conv( (conv): Conv2d(640, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): SiLU(inplace=True) ) (2): Conv2d(320, 9, kernel_size=(1, 1), stride=(1, 1)) ) ) (dfl): DFL( (conv): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) ) ) has been trained for 100 epochs. Fine-tuning for 100 more epochs. TensorBoard: model graph visualization added β Image sizes 640 train, 640 val Using 2 dataloader workers Logging results to runs/detect/train Starting training for 200 epochs...
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
102/200 8.62G 0.4382 0.3942 0.9903 3 640: 1%| | 21/3672 [00:15<41:57, 1.45it/s]/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork() 102/200 8.62G 0.4382 0.3942 0.9903 3 640: 1%| | 21/3672 [00:16<47:13, 1.29it/s]
I am also having similar issue for yolov8l
or yolov8x
models
Hello!
It seems like you're encountering a similar issue with the yolov8l
or yolov8x
models. Here are a few steps you can try to resolve this:
Update Packages: Ensure you are using the latest version of the Ultralytics package. You can update it using:
pip install -U ultralytics
Resume Training: When resuming training, make sure to specify the correct checkpoint and desired number of epochs. For example:
model.train(data='your_data.yaml', epochs=1, resume=True)
Check Arguments: If the training defaults to 200 epochs, it might be due to HUB-specific arguments overriding your local settings. Ensure you're not using conflicting parameters.
Logs and Errors: Review any error messages or logs for additional clues. The error related to 'NoneType' object has no attribute 'json'
might indicate a network or server issue. Retrying the operation could help.
If the problem persists, please ensure it's reproducible with the latest versions and feel free to provide more details here. The community and the Ultralytics team are always here to help! π
If you have any more questions, feel free to ask!
Search before asking
HUB Component
Models, Training
Bug
I finished training the model on Google Colab but it failed to upload by the end.
Afterward, I tried to resume the model and run extra epoch, but the same failure happened again. How to recover the model properly?
Thanks for your input!
Environment
No response
Minimal Reproducible Example
No response
Additional
No response