ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
138 stars 14 forks source link

Ultralytics Cloud Training run won't start #835

Closed KiwiBryn closed 2 months ago

KiwiBryn commented 2 months ago

Search before asking

HUB Component

Training

Bug

Hi

Late last week I started a training run using a RoboFlow dataset (https://universe.roboflow.com/holstein/rearudderkeypoint) and something went "wrong" as it started and now can't start a new training run or stop the original one.

The training run starts then disconnects after a while (maybe minutes), then if I try to can start another "Ultralytics Cloud Temporary Unavailable" - "At the moment you can only train one model at a time ....."

Sorry about quality of the Snip screen shots.

Thanks Bryn

image

image

image

image

image

image

Environment

Ultralytics HUB Version v0.1.51 Client User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0 Operating System Win32 Browser Window Size 1262 x 823 Server Timestamp 1726273112

Minimal Reproducible Example

  1. I login to hub
  2. Navigate to dataset
  3. Start training
  4. Wait
  5. Disconnection
  6. Try start a new training run "Ultralytics Cloud... At the moment you can only train one model at a time..."
  7. Logout
  8. Login hours later
  9. Try steps above
  10. Same outcome.

Additional

Model URL https://hub.ultralytics.com/models/ob8y3fvp44d36onqDKDs

UltralyticsAssistant commented 2 months ago

👋 Hello @KiwiBryn, thanks for bringing this to our attention! This is an automated response, and an Ultralytics engineer will assist you soon. Let's try to address your issue with some steps and guidelines to help troubleshoot your training run on Ultralytics HUB 🚀.

Troubleshooting Steps:

  1. Check Training Logs:

    • Visit your Ultralytics HUB Profile and look at the logs for the failed training session to identify any specific error messages or indications of what might have gone wrong.
  2. Restart the HUB Session:

    • Log out of the HUB, clear your browser cache, and log back in. This can often resolve session-related issues.
  3. Network Stability:

    • Ensure your network connection is stable. Disruptions might cause the training to disconnect.
  4. Verify Dataset Compatibility:

  5. Training Resource Availability:

    • Wait a while before starting a new training session, especially if you're seeing messages about resource availability.

Reporting a Bug or Issue:

Next Steps:

We appreciate your patience and cooperation!

KiwiBryn commented 2 months ago

Using Roboflow integration, but when I try export then upload get this error

image

Even with this error training with other datasets won't start

If I manually edit the yaml file it will upload

train: ../train/images val: ../valid/images test: ../test/images <<====== remove this as no test images

kpt_shape: [10, 3] flip_idx: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

nc: 1 names: ['RearHolstein']

roboflow: workspace: holstein project: rearudderkeypoint version: 12 license: CC BY 4.0 url: https://universe.roboflow.com/holstein/rearudderkeypoint/dataset/12

glenn-jocher commented 2 months ago

Hello @KiwiBryn,

Thanks for reaching out! It seems like you're encountering an issue with the dataset export and upload process. Let's address this:

Troubleshooting Steps:

  1. YAML File Adjustment:

    • As you've discovered, manually editing the YAML file to remove the test entry when there are no test images is a good approach. Ensure that the paths and configurations align with your dataset structure.
  2. Roboflow Integration:

    • Double-check the export settings in Roboflow to ensure compatibility with Ultralytics HUB. Sometimes, re-exporting the dataset with the correct settings can resolve these issues.
  3. Training with Other Datasets:

    • If training with other datasets is also affected, try clearing your browser cache and restarting your session on Ultralytics HUB. This can sometimes resolve session-related issues.
  4. Verify Latest Versions:

    • Ensure you are using the latest version of Ultralytics HUB and any related packages. Updates often include bug fixes and improvements.

Additional Suggestions:

Feel free to reach out if you have further questions or need additional assistance. We're here to help! 😊

sergiuwaxmann commented 2 months ago

@KiwiBryn

Did you import the Roboflow dataset following the instructions from our documentation on Integrations (https://docs.ultralytics.com/hub/integrations)? If you exported the dataset from Roboflow as a .zip file and uploaded it to Ultralytics HUB, we’ve encountered issues with the format in the past. If this was your approach, please ensure the dataset is in the correct format as outlined in our documentation (https://docs.ultralytics.com/hub/datasets#upload-dataset).

Your issue reminds me of this issue: https://github.com/ultralytics/hub/issues/769. Maybe you have a similar problem (in the dataset)?

KiwiBryn commented 2 months ago

Hi

After some experimentation, I think my issue was an unfortunate combination of 3 things.

Roboflow Universe dataset yaml file need editing Maybe a "poisonous cookie" issue The error message when a training run failed to start. Then, I couldn't start another one(even with different dataset), and the way the training run started then stopped minute(s) later,

There was also something odd with the Roboflow integration, I was getting a Roboflow SQL error when I clicked on my "linked workspace" in Ultralytics integrations. I have been unable to repro this will try a few more approaches.

YoloV8 tooling works well for my application, thank you for following up so quickly.

@KiwiBryn

sergiuwaxmann commented 2 months ago

@KiwiBryn Hi! Unfortunately, we've encountered issues with the Roboflow format in the past, and there's nothing we can do since the problem is not on our end. Thank you for the feedback! We're glad YOLOv8 is helping you!