ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
138 stars 14 forks source link

Model training stuck in disconnected problem #791

Closed QiConstantine closed 3 months ago

QiConstantine commented 3 months ago

Search before asking

Question

I found my model stuck in the state of disconnected and it cannot become work again, I've got no idea how to solve it.

iShot_2024-08-04_14 53 52

Additional

No response

github-actions[bot] commented 3 months ago

👋 Hello @QiConstantine, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

pderrenger commented 3 months ago

@QiConstantine hello,

Thank you for reaching out and providing the details of your issue. I'm sorry to hear you're experiencing difficulties with your model training.

To help resolve this, please try the following steps:

  1. Update to the Latest Version: Ensure that you are using the latest versions of the Ultralytics HUB and YOLO packages. Sometimes, issues are resolved in newer releases. You can update your packages using:

    pip install --upgrade ultralytics
  2. Check Internet Connection: The "disconnected" state might be related to network issues. Verify that your internet connection is stable.

  3. Restart the Training: Sometimes, simply restarting the training process can resolve the issue. You can do this by stopping the current training session and starting a new one.

  4. Review Logs: Check the logs for any error messages or warnings that might provide more insight into why the model is stuck. This can often point to specific issues that need to be addressed.

  5. Resource Availability: Ensure that your system has sufficient resources (CPU, GPU, memory) available for training. Resource constraints can sometimes cause training processes to hang.

If the issue persists after trying these steps, please provide any additional logs or error messages you see. This will help us diagnose the problem more effectively.

Thank you for your patience and cooperation. We appreciate your contributions to the YOLO community and the Ultralytics team is here to support you!

QiConstantine commented 3 months ago

I rent the server on your website(https://hub.ultralytics.com) and train my model, thence I don't think I can find a log. Is it helpful if I give you my account, so you can find out what happened on the server.

sergiuwaxmann commented 3 months ago

@QiConstantine Can you share your model ID with me so I can investigate tis further? You can find the model ID in the URL of your model's page.

QiConstantine commented 3 months ago

1x4uqbE0WxqFyH720Ikj?tab=train,is this?

sergiuwaxmann commented 3 months ago

@QiConstantine It appears there is a bug with the timed training. While we work on fixing this issue, I have refunded the amount you used for this training so that you can start another one.

PS I suggest using Epochs training instead of Timed training.

QiConstantine commented 3 months ago

thanks for your help

sergiuwaxmann commented 3 months ago

@QiConstantine No worries! Apologies for the inconvenience.