Closed JhonFrederick closed 3 weeks ago
๐ Hello @JhonFrederick, thank you for raising an issue about Ultralytics HUB ๐! Please visit our HUB Docs to learn more:
If this is a ๐ Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.
If this is a โ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.
We try to respond to all issues as promptly as possible. Thank you for your patience!
Hello @JhonFrederick!
First of all, please accept our apologies for the inconvenience caused.
Based on the screenshot you shared, you used Epochs training (not Timed training) but I would like to investigate this further. Can you please share your model ID (you can find it in the URL) here?
Also, looking at the right side of your screenshot, I can see negative epochs which makes me think that you might face an issue we are currently trying to solve (#622).
I remember setting the timed training to a value of 1 day, but now I'm not sure. Mainly because I am currently running other test with Epoch Training and the way the information is displayed was not the same as the attempt shown in the screenshot. But since you mention the issue, it could be due to that. Model ID: FuVEbxOoAcWJCFA7fa9m
Edit When I ran my model (FuVEbxOoAcWJCFA7fa9m), a few minutes later I reviewed the billing data and the information corresponded to the time entered (1 day), the total value was already calculated. But with epoch training this is calculated over time. I don't know if it's relevant, but I noticed this now that I'm running other model (with Epoch Training).
@JhonFrederick hello again, and thank you for providing the model ID and additional details. It clarifies your situation significantly.
Given the information and your experience with both timed and epoch training, it indeed sounds like the unusual behavior you encountered with the model FuVEbxOoAcWJCFA7fa9m
might be related to the issue we're currently addressing.
I appreciate your patience and understanding as we work towards resolving this. In the meantime, it seems you've correctly identified different billing behaviors between timed and epoch trainingโtimed training estimates your total cost upfront based on the duration, whereas epoch training's cost accumulates over time.
Your observations are indeed relevant and help us ensure the platform works as expected for everyone. We'll keep you updated on our progress with the mentioned issue. Please, stay tuned! ๐
According to the above screen, my second model finished (with epoch training) with ID: tj2HLEVdErYxgunZzH9Z, but when I go to preview or deployment tab, I get the following message "Model not trained".
Attached a screenshot of the billing summary, which shows the different attempts to complete the training.
Please tell me in this case what I could be doing wrong so that it doesn't allow me to use the trained model?
I appreciate your help again in advance
@JhonFrederick hello again!
Thanks for reaching out with these details. It looks like an issue on our end where the model's training status hasn't correctly updated in the UI, despite the training completion. This misalignment is likely causing the "Model not trained" message you're seeing.
For now, could you try refreshing the page or logging out and back into the platform to see if that helps sync the status? Sometimes, a simple refresh can resolve such discrepancies.
If the issue persists, rest assured, we're here to help! We'll investigate further using the model ID tj2HLEVdErYxgunZzH9Z
you provided and ensure your model becomes accessible for preview and deployment.
Again, we truly appreciate your patience and feedback as we work to improve the platform. Stay tuned! ๐
@JhonFrederick
Something went wrong with the first model (FuVEbxOoAcWJCFA7fa9m
) and we are not yet sure what. Our team is investigating this issue. Regarding the second model (tj2HLEVdErYxgunZzH9Z
), it appears that although the model finished training, the final upload of weights failed, which is why the model is unusable.
We have refunded the account balance you used and kindly ask you to start the training process again from scratch. Once again, our apologies for the inconvenience caused.
Hi,
I tried again with another test using epoch training, but again I had problems, I attached proof of this.
Model ID: xrGz5bRPDQvMniPK8eIR
Billing information
In this case the training was going well up to a certain point, after 75%, I had to retry the training a couple of times until it was completed, but without the possibility of using the model, until it finally ended in the state shown
Hello @JhonFrederick!
I apologize once again for the inconvenience.
Based on our internal tests, we've observed that, in approximately 10% of cases, the final weights upload fails. This results in the model being stuck at 100%. If the training is resumed, the session fails since the training has already completed. Our team is currently working on updating the logic for uploading weights to the Ultralytics HUB to prevent this issue.
Meanwhile, we have refunded the account balance you used.
CC @hassaanfarooq01
Hello @JhonFrederick! Great news! Our team has released a fix for the issue you reported. You should no longer experience this problem in new Cloud Training sessions. Thanks for your patience!
Hi,
Was it released today? because I was doing tests with epoch Training and they all gave me negative epochs
Model ID: Jw8BPBb2kmX0i7lErCiP
, j8NsAEFksZH65pEmt0e1
, xvvlSW8VL1YTsCGM5jxr
, yEAC9FrDTMgJBxvk42wp
,
Each model had finished with 100% but did not allow using the model and after a retry ended with -1 epochs.
After finishing a epoch training that I'm running, I try the Timed training again and I will comment on my results
Hello @JhonFrederick! We released the fix today (when I sent you the message above). Unfortunately, the recent fix does not apply to models trained on earlier versions, so you will need to retrain your models. We sincerely apologize for the inconvenience this causes.
Hi,
Was it released today? because I was doing tests with epoch Training and they all gave me negative epochs Model ID:
Jw8BPBb2kmX0i7lErCiP
,j8NsAEFksZH65pEmt0e1
,xvvlSW8VL1YTsCGM5jxr
,yEAC9FrDTMgJBxvk42wp
, Each model had finished with 100% but did not allow using the model and after a retry ended with -1 epochs.After finishing a epoch training that I'm running, I try the Timed training again and I will comment on my results
Finally my test with epoch training was successful, the test was run with 5 epochs (I didn't want to lose credits like with the other models I mentioned above). But when execute a timing training, again get negative epochs and I cannot make click in Resumen Training
button.
Model ID: M16CdXdrDQNg0h7hGWJw
I paid for the "Hub Pro" plan expecting to take advantage of the cloud training, but for the entire month I basically never took advantage of it. It's a shame because my plan ends today and I was hoping to test with a better trained model but the issue still persists.
It should be noted that my problem was initially due to timing training, as I did not know the number of epochs my training could take.
I hope that at least all the failed models, including this, allow the team to correct the problem.
Edit: Training started yesterday at 4 pm (Colombian time), so it didn't even last 24 hours.
I am sorry you had a negative experience with Cloud Training due to issues on our end and we hope these won't happen again. Our team will look at the failed models and refund the account balance spent on them.
Search before asking
Question
Currently with our team are using YOLOV8 models and we decide to train our own model using cloud training with time training to test this option.
According to the documentation, we thought that at the end of the given time, the model would be trained to that point and allow it to be used, but this does not seem to be the case. I don't have much experience with training models and I was exploring the platform, for this reason, I want to know if there is something I am doing wrong or have misunderstood?
This is the final status of the model training, we thought that buying more credits would enable the "Resume training" button, but it didn't happen.
I appreciate your help in advance
Additional
No response