ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
138 stars 14 forks source link

training ends up disconnected. The systeem says that the training ha snot ended. #782

Open laulenne opened 3 months ago

laulenne commented 3 months ago

Search before asking

Question

I have just bought a license for Ultralytics hub hoping to train a model. But the training does not ends properly without any indication. It indicates disconnected. I cannot find any info what is going on. I tried several time with low epochs but it remains the same.

Additional

No response

github-actions[bot] commented 3 months ago

👋 Hello @laulenne, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

pderrenger commented 3 months ago

Hi there,

Thank you for reaching out and for your support of Ultralytics HUB! I'm sorry to hear that you're experiencing issues with your training sessions disconnecting. Let's work through this together.

First, please ensure that you are using the latest version of the Ultralytics HUB and related packages. Sometimes, updates include important bug fixes that might resolve your issue.

Here are a few steps you can follow to troubleshoot the problem:

  1. Check Internet Connection: Ensure that your internet connection is stable throughout the training process. An unstable connection can sometimes cause disconnections.

  2. Review Logs: Check the logs for any error messages or warnings that might give us more insight into why the training is disconnecting. You can find the logs in the HUB interface under the specific training session.

  3. Resource Allocation: Verify that your system has enough resources (CPU, GPU, RAM) allocated for the training process. Insufficient resources can sometimes cause unexpected disconnections.

  4. Try a Different Dataset: If possible, try training with a different dataset to see if the issue persists. This can help determine if the problem is dataset-specific.

If the issue continues, please provide more details such as:

This information will help us diagnose the problem more effectively.

Thank you for your patience and cooperation. We look forward to resolving this issue for you! 😊

laulenne commented 3 months ago

Hi , What i did was uploading a dataset onto the hub through the web UI and define a project to train a segment model also through the webUI. I started the training for 100 epochs . The training starts fine and is running fine until it reached 100 epochs. But when I move to the other tabs. It says: Model Not trained.. Finish training your model.

I cannot download the model. My point of using the tab was to created the model so i do not have all the pain with CUDA/Torch, ultralytics mix...

Best Regards, Laurent


From: github-actions[bot] @.> Sent: 27 July 2024 9:22 PM To: ultralytics/hub @.> Cc: laulenne @.>; Mention @.> Subject: Re: [ultralytics/hub] training ends up disconnected. The systeem says that the training ha snot ended. (Issue #782)

👋 Hello @laulennehttps://github.com/laulenne, thank you for raising an issue about Ultralytics HUBhttps://hub.ultralytics.com 🚀! Please visit our HUB Docshttps://docs.ultralytics.com/hub to learn more:

If this is a 🐛 Bug Report, please provide screenshots and steps to reproducehttps://docs.ultralytics.com/help/minimum_reproducible_example/ your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

— Reply to this email directly, view it on GitHubhttps://github.com/ultralytics/hub/issues/782#issuecomment-2254233075, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIE3IW2BDZ5PDFI664HTVILZOPXO3AVCNFSM6AAAAABLSDPQCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGIZTGMBXGU. You are receiving this because you were mentioned.Message ID: @.***>

pderrenger commented 3 months ago

@laulenne hi Laurent,

Thank you for providing detailed information about the issue you're experiencing. I'm sorry to hear that the training process isn't completing as expected. Let's try to resolve this together.

Firstly, please ensure that you are using the latest version of the Ultralytics HUB. Sometimes, updates include important bug fixes that might resolve your issue.

Here are a few steps to help troubleshoot the problem:

  1. Check Training Logs: Navigate to the training logs in the HUB interface to see if there are any error messages or warnings that might indicate why the training did not complete successfully. This can provide valuable insights.

  2. Resource Allocation: Ensure that your system has sufficient resources (CPU, GPU, RAM) allocated for the training process. Insufficient resources can sometimes cause the training to stop prematurely.

  3. Re-run Training with Fewer Epochs: Try running the training for a smaller number of epochs (e.g., 10 or 20) to see if the issue persists. This can help determine if the problem is related to the duration of the training.

  4. Dataset Integrity: Verify that your dataset is correctly formatted and uploaded. Sometimes, issues with the dataset can cause the training to fail. You can refer to our Datasets: Preparing and Uploading guide for more information.

If the issue continues, please provide the following additional details:

This information will help us diagnose the problem more effectively.

Thank you for your patience and cooperation. We look forward to resolving this issue for you! 😊

sergiuwaxmann commented 3 months ago

@laulenne Paula's suggestions are valid and I suggest focusing your attention on the dataset structure (see related issue).

laulenne commented 3 months ago

I did fix the dataset and th etraining is working fine but i cannot download the model. The interface looks like this but no download. image

sergiuwaxmann commented 3 months ago

@laulenne Can you share you model ID with me so I can investigate this further? The model ID is in the model's page URL.