ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
139 stars 14 forks source link

Project is disconnected #713

Closed spartajet closed 3 weeks ago

spartajet commented 6 months ago

Search before asking

HUB Component

Projects

Bug

My project is disconnected and cannot get cloud trainning result . Model - 5 June 2024 18:48

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

github-actions[bot] commented 6 months ago

๐Ÿ‘‹ Hello @spartajet, thank you for raising an issue about Ultralytics HUB ๐Ÿš€! Please visit our HUB Docs to learn more:

If this is a ๐Ÿ› Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a โ“ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

pderrenger commented 6 months ago

Hello,

Thank you for reporting this issue with your project disconnection. To assist you better, could you please provide a bit more detail about the steps leading up to the disconnection? Additionally, any error messages or logs that were generated during the disconnection would be very helpful.

We'll work to resolve this as quickly as possible once we have a bit more information.

spartajet commented 6 months ago

thanks๏ผŒhowever, I just config the classify task in your HUB website and pay, Then I cannot get the train result. as shown in pic

Snipaste_2024-06-06_15-08-01
sergiuwaxmann commented 6 months ago

@spartajet Can you share the model ID so I can investigate this further?

The issue might be GPU availability as I just tested and everything works fine. Read more about Cloud Training: https://docs.ultralytics.com/hub/cloud-training

spartajet commented 6 months ago

I think "JADqyjS8ocsOQsULppOS" is my model id from share dialog webpage. thank you!

sergiuwaxmann commented 6 months ago

@spartajet Perfect, thank you! I will keep you updated.

emilio-balda commented 6 months ago

Hi @pderrenger ๐Ÿ‘‹ I have the same issue

When training a model. It trains normally. I reach to the point of "Optimizing Weights" but then it gets stuck in that state for around 5 minutes. After that I get the following screen saying the model is "Disconnected". image

It lets me retry training from the last checkpoint, but the same issue happens again. You can see here that I tried a couple of times. image

This is how it looks in the Models Panel image

I had a similar issue earlier today when uploading datasets. The dataset was not huge (2GB). I got an "Unknown Error". This happened to me twice with slightly different versions of my dataset. On the third try, I was able to upload it.

PS: Things were working fine for me 2 days ago. Maybe something changed yesterday or today.

spartajet commented 6 months ago

@emilio-balda my dataset is just 54M, and I cannot get error message ...

sergiuwaxmann commented 5 months ago

@emilio-balda Hello! Can you please share your model ID so I can investigate this further?

emilio-balda commented 5 months ago

@sergiuwaxmann Hi! I saw above that the issue might be GPU availability so I tried resuming training in the middle of the night and that worked ๐Ÿš€

image

In case you want to take look, here is my model ID: aASmZT80nXYK3VTG9rIw

sergiuwaxmann commented 5 months ago

@emilio-balda Thank you! Yes, I will look into this but I am glad it worked in the end.

jankalthoefer commented 4 months ago

Hey @sergiuwaxmann I am having the same issue over and over again. Sometimes it works when I retry, the last couple of tries it didn't unfortunately.

Is there anything to keep in mind?

Here are two projects that failed before: Uqlbyy57FiLg394OlkXU nBWRPqNKwzLUPmDwfESX

sergiuwaxmann commented 4 months ago

@jankalthoefer Hello! I believe the issue might be related to GPU availability. During the 15-minute timeout, we attempt to spin up the dedicated instance for Cloud Training, but it might fail due to GPU availability. This issue has been logged, and we are working on improving the error handling.

jankalthoefer commented 4 months ago

Got it! Let me know if you have any updates. For now I am moving to Colab then.

sergiuwaxmann commented 4 months ago

@jankalthoefer Sure thing! Apologies for the inconvenience.

github-actions[bot] commented 3 months ago

๐Ÿ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO ๐Ÿš€ and Vision AI โญ