ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
107 stars 11 forks source link

Stucked on 100% Optimizing weights #630

Closed sdy623 closed 3 weeks ago

sdy623 commented 1 month ago

Search before asking

HUB Component

Training

Bug

I used my own agent to train the model, but I can't find the model I trained on the HUB webpage. Some has this simailar problem, but I can find my results.csv file. For that problem no results.csv file. image Here is my training log

Ultralytics HUB: Uploading checkpoint https://hub.ultralytics.com/models/GxzAXECMi65QweQlpTjs

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     97/100      17.4G      1.365     0.5464      1.235        101        640: 100%|██████████| 264/264 [01:42<00:00,  2
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:11<
                   all        747      49603      0.948      0.942      0.976      0.667

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     98/100      17.3G      1.363     0.5444      1.235        139        640: 100%|██████████| 264/264 [01:40<00:00,  2
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:11<
                   all        747      49603      0.949      0.941      0.977      0.667

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     99/100      17.3G       1.36     0.5425      1.232        137        640: 100%|██████████| 264/264 [01:40<00:00,  2
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:11<
                   all        747      49603      0.949      0.941      0.977      0.668

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
    100/100      17.3G      1.349     0.5367       1.22        156        640: 100%|██████████| 264/264 [01:41<00:00,  2
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:11<
                   all        747      49603       0.95      0.941      0.977      0.668

100 epochs completed in 3.364 hours.
Optimizer stripped from runs/detect/train12/weights/last.pt, 311.6MB
Optimizer stripped from runs/detect/train12/weights/best.pt, 311.6MB

Validating runs/detect/train12/weights/best.pt...
Ultralytics YOLOv8.1.45 🚀 Python-3.10.12 torch-2.1.0a0+32f93b1 CUDA:0 (B1.gpu.large, 24118MiB)
YOLOv5x6u summary (fused): 463 layers, 155375236 parameters, 0 gradients, 250.3 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:42<
                   all        747      49603       0.95      0.941      0.977      0.668
Speed: 0.1ms preprocess, 8.5ms inference, 0.0ms loss, 1.8ms postprocess per image
Results saved to runs/detect/train12
Ultralytics HUB: Syncing final model...
100%|██████████| 297M/297M [02:44<00:00, 1.89MB/s]
Ultralytics HUB: Done ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/GxzAXECMi65QweQlpTjs 🚀

After the View model at https://hub.ultralytics.com/models/GxzAXECMi65QweQlpTjs 🚀, the training exits, but shows Optimizing weights. After a while it quits, it become disconnected, and I can't find the model I trained in the HUB. image

Could some can help me, I will appreciate their help.

Environment

Minimal Reproducible Example

  1. Login to hub
  2. Choose 'Bring your own agent' option to train the model
  3. Exec the model on my trainging agent
  4. Wait the train ends
  5. The program comes up.

Additional

No response

github-actions[bot] commented 1 month ago

👋 Hello @sdy623, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

sdy623 commented 1 month ago
image
sergiuwaxmann commented 1 month ago

@sdy623 Thank you for bringing this to our attention. It appears that the upload of final weights encountered a failure. Our team is currently investigating the issue to identify and resolve the underlying cause. I will keep you updated on our progress Your patience and understanding are greatly appreciated.

sergiuwaxmann commented 3 weeks ago

Hello @sdy623! Great news! Our team has released a fix for the issue you reported. You should no longer experience this problem in new Cloud Training sessions. Thanks for your patience!

sdy623 commented 3 weeks ago

Thank you very much. How can I deploy the fixed version

sergiuwaxmann commented 3 weeks ago

@sdy623 When using Ultralytics HUB, the system automatically utilizes the latest version. For local training, please ensure you are using the most recent ultralytics version (8.2.2). Unfortunately, the recent fix does not apply to models trained on earlier versions, so you will need to retrain your model with the latest version. We sincerely apologize for the inconvenience this causes.