Closed spartajet closed 3 weeks ago
๐ Hello @spartajet, thank you for raising an issue about Ultralytics HUB ๐! Please visit our HUB Docs to learn more:
If this is a ๐ Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.
If this is a โ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.
We try to respond to all issues as promptly as possible. Thank you for your patience!
Hello,
Thank you for reporting this issue with your project disconnection. To assist you better, could you please provide a bit more detail about the steps leading up to the disconnection? Additionally, any error messages or logs that were generated during the disconnection would be very helpful.
We'll work to resolve this as quickly as possible once we have a bit more information.
thanks๏ผhowever, I just config the classify task in your HUB website and pay, Then I cannot get the train result. as shown in pic
@spartajet Can you share the model ID so I can investigate this further?
The issue might be GPU availability as I just tested and everything works fine. Read more about Cloud Training: https://docs.ultralytics.com/hub/cloud-training
I think "JADqyjS8ocsOQsULppOS" is my model id from share dialog webpage. thank you!
@spartajet Perfect, thank you! I will keep you updated.
Hi @pderrenger ๐ I have the same issue
When training a model. It trains normally. I reach to the point of "Optimizing Weights" but then it gets stuck in that state for around 5 minutes. After that I get the following screen saying the model is "Disconnected".
It lets me retry training from the last checkpoint, but the same issue happens again. You can see here that I tried a couple of times.
This is how it looks in the Models Panel
I had a similar issue earlier today when uploading datasets. The dataset was not huge (2GB). I got an "Unknown Error". This happened to me twice with slightly different versions of my dataset. On the third try, I was able to upload it.
PS: Things were working fine for me 2 days ago. Maybe something changed yesterday or today.
@emilio-balda my dataset is just 54M, and I cannot get error message ...
@emilio-balda Hello! Can you please share your model ID so I can investigate this further?
@sergiuwaxmann Hi! I saw above that the issue might be GPU availability so I tried resuming training in the middle of the night and that worked ๐
In case you want to take look, here is my model ID: aASmZT80nXYK3VTG9rIw
@emilio-balda Thank you! Yes, I will look into this but I am glad it worked in the end.
Hey @sergiuwaxmann I am having the same issue over and over again. Sometimes it works when I retry, the last couple of tries it didn't unfortunately.
Is there anything to keep in mind?
Here are two projects that failed before: Uqlbyy57FiLg394OlkXU nBWRPqNKwzLUPmDwfESX
@jankalthoefer Hello! I believe the issue might be related to GPU availability. During the 15-minute timeout, we attempt to spin up the dedicated instance for Cloud Training, but it might fail due to GPU availability. This issue has been logged, and we are working on improving the error handling.
Got it! Let me know if you have any updates. For now I am moving to Colab then.
@jankalthoefer Sure thing! Apologies for the inconvenience.
๐ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO ๐ and Vision AI โญ
Search before asking
HUB Component
Projects
Bug
My project is disconnected and cannot get cloud trainning result . Model - 5 June 2024 18:48
Environment
No response
Minimal Reproducible Example
No response
Additional
No response