ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
107 stars 11 forks source link

I am a Hub Pro User, Trying to Upload Data For Pose detection, after upload its showing Unable to process the dataset error. #665

Closed kumarneeraj2005 closed 1 week ago

kumarneeraj2005 commented 2 weeks ago

Search before asking

HUB Component

Datasets

Bug

Hello, Ultralytics Hub Pro Team. While uploading data for pose detection, I receive an error stating "Unable to process the dataset". Could you please look at this issue? Dataset size - 10.9 GB. I manually checked that all of the label formats and folder structures are correct. Screenshot 2024-04-28 at 7 56 28 AM Screenshot 2024-04-28 at 7 56 28 AM

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

github-actions[bot] commented 2 weeks ago

👋 Hello @kumarneeraj2005, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

pderrenger commented 2 weeks ago

@kumarneeraj2005 hello! 😊

Thanks for reaching out and for being a part of our Hub Pro community. It seems like you've done a preliminary check on the dataset and everything looks in place, which is great! The "Unable to process the dataset" error can sometimes be caused by transient issues with our servers or by specific peculiarities within the dataset that aren't immediately obvious.

Here's what we recommend:

  1. Retry the upload - Sometimes, giving it another go can solve the issue if it was related to a temporary server hiccup.
  2. Check for hidden files or corrupt images - Occasionally, hidden files or corrupt images in the dataset can cause processing to fail. Make sure all files are in the expected format and can be opened without issues.
  3. Validate dataset size and format - Ensure your dataset size is within the limits for Pro users and that all files adhere to the expected formats for pose detection datasets.

If after trying these steps you're still facing the issue, could you provide us with some more details? Specifically:

We're here to help you get this resolved! Thanks for your patience and for being a valued member of the Ultralytics community. For further assistance, please refer to our docs at https://docs.ultralytics.com/hub which might give you more detailed guidance on dataset requirements and troubleshooting steps.

kumarneeraj2005 commented 2 weeks ago

we did all these 3 steps and everything is fine but still getting same error. Could you please check from backend ?? if needed i will share details, please provide your support email ID. We are struggling from yesterday. We tried more than 10 times.

sergiuwaxmann commented 2 weeks ago

@kumarneeraj2005 You can validate your dataset like this (before uploading it to Ultralytics HUB):

from ultralytics.hub import check_dataset
check_dataset('path/to/coco8.zip')

Please let us know if the local validation is successful. If it isn't, the check_dataset function should give you insights to help you figure out what is wrong with your dataset.

kumarneeraj2005 commented 2 weeks ago

@sergiuwaxmann local validation is successful.. Please help us..How to resolve above issue..

Starting HUB dataset checks for /Users/ashishjha/Downloads/data/pose_data.zip.... Scanning /Users/ashishjha/Downloads/data/pose_data/labels/train... 61156 images, 3 backgrounds, 0 corrupt: 100%|██████████| 61156/61156 [00:14<00:00, 4157.47it/s] New cache created: /Users/ashishjha/Downloads/data/pose_data/labels/train.cache albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) Statistics: 100%|██████████| 61156/61156 [00:00<00:00, 710238.39it/s] Scanning /Users/ashishjha/Downloads/data/pose_data/labels/val... 3219 images, 0 backgrounds, 0 corrupt: 100%|██████████| 3219/3219 [00:00<00:00, 4241.20it/s] New cache created: /Users/ashishjha/Downloads/data/pose_data/labels/val.cache albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) Statistics: 100%|██████████| 3219/3219 [00:00<00:00, 809003.81it/s]

kumarneeraj2005 commented 2 weeks ago

Hello, Ultralytics team, could you please let me know, if you have any support option for Pro subscription ? or its simply Pro? very frustrated with your solution.

sergiuwaxmann commented 2 weeks ago

Hello @kumarneeraj2005!

We just checked and everything seems to be working correctly on our end. For example, we can successfully upload our example pose dataset to Ultralytics HUB.

Of course, the most important aspect is having a valid dataset format. The check_dataset function should tell you if you can upload the dataset to Ultralytics HUB or not (if your dataset is formatted correctly, you should see Checks completed correctly ✅. Upload this dataset to https://hub.ultralytics.com/datasets/. in your console). Please see below an example of a Pose dataset formatted correctly. dataset_format_pose

Here are some things that you can check in Ultralytics HUB:

  1. Make sure you select "Pose" when you upload the dataset to Ultralytics HUB. dataset_upload_pose
  2. Make sure you have enough space on your Ultralytics HUB account. storage

If the points above do not help you, please share your account's email or a dataset/project/model ID with us so we can investigate your account.

kumarneeraj2005 commented 2 weeks ago

@yogendrasinghx @sergiuwaxmann

check_size check data_upload

As you suggested above, everything seems to be fine from our end. Dataset is totally valid as per your platform requirement.

For your reference i am sharing my userID and Project id :

UserID : Healiumdigital@gmail.com

Please do let me know if anything else is needed.

sergiuwaxmann commented 2 weeks ago

@kumarneeraj2005

@Laughing-q discovered the issue occurs when there is an empty array of keypoints in the dataset and created a PR to fix this. Thank you for bringing this to our attention. I will update you as soon as we merge and deploy the fix. Please accept our apologies for the inconvenience caused.

Alternatively, you can remove the 3 background images from your dataset if you do not want to wait for the fix to be merged and deployed.

kumarneeraj2005 commented 2 weeks ago

@sergiuwaxmann could you please let me know which 3 background images you are talking about, if possible give me images detail...

sergiuwaxmann commented 2 weeks ago

@kumarneeraj2005 Unfortunately, I do not have access to your dataset but I can see in the screenshot you shared that you have 3 backgrounds.

kumarneeraj2005 commented 2 weeks ago

@sergiuwaxmann we will wait for your fix

kumarneeraj2005 commented 1 week ago

@sergiuwaxmann @yogendrasinghx WhatsApp Image 2024-05-01 at 19 26 47 WhatsApp Image 2024-05-01 at 19 27 09

Guys - seems your platform is not ready for Production, its really serious issue. check attached images. When dataset is small its accepting with so called background images and uploading to your platform, but when its large dataset your platform is not able to handle. I request you to please look in this issue on priority basis. O/w cancel my Pro-membership, and i am really serious.

glenn-jocher commented 1 week ago

@sergiuwaxmann @kumarneeraj2005 merged fix PR https://github.com/ultralytics/ultralytics/pull/10415 by @Laughing-q earlier today and is now published in ultralytics 8.2.6. I'll sync up with you to redeploy HUB with these fixes.

@kumarneeraj2005 we should have this fixed soon, thank you for your patience here and helping us diagnose the problem!

glenn-jocher commented 1 week ago

Hey @kumarneeraj2005, great news! 🎉 The HUB has been fully updated with all the latest fixes from https://github.com/ultralytics/ultralytics/pull/10415, thanks to the examples you provided for debugging. 🛠️

Please give your dataset upload and training another go, and don't hesitate to reach out if you encounter any more issues or have suggestions for improvement. Your input is incredibly valuable in enhancing our product. Looking forward to hearing from you! 😊

kumarneeraj2005 commented 1 week ago

@glenn-jocher Thanks, Able to upload dataset. While training on your cloud it showing 116 hours for 64K images, could you please tell me is it normal behaviour ?? Is any way to reduce the training time ? and can we have an option to select better GPU except your T4. image

sergiuwaxmann commented 1 week ago

@kumarneeraj2005 I am glad the upload works. Once again, thank you for bringing this to our attention! The estimated remaining training time is adjusted during training (it takes us a few epochs to calculate remaining time more accurately).

kumarneeraj2005 commented 1 week ago

@sergiuwaxmann @glenn-jocher could you please explain why your system is deducting 2 bills for single model training ??

image
sergiuwaxmann commented 1 week ago

@kumarneeraj2005 It looks like you resumed training. This might be a UI issue (not showing the first training session as completed). I will investigate our system and refund the extra charges if the balance was subtracted two times.

glenn-jocher commented 1 week ago

@glenn-jocher Thanks, Able to upload dataset. While training on your cloud it showing 116 hours for 64K images, could you please tell me is it normal behaviour ?? Is any way to reduce the training time ? and can we have an option to select better GPU except your T4. image

@kumarneeraj2005 hi there! About your training time question, yes this is a long time! We are working on supporting newer GPUs soon like NVIDIA L4 GPUs and NVIDIA L40S GPUs from the Ada-Lovelace generation which should be able to complete your training much faster. Hopefully we will have these updates in place over the next few months as we continue to update HUB with the best features and fixes.

kumarneeraj2005 commented 1 week ago

@kumarneeraj2005 It looks like you resumed training. This might be a UI issue (not showing the first training session as completed). I will investigate our system and refund the extra charges if the balance was subtracted two times.

@sergiuwaxmann any update on this ?

sergiuwaxmann commented 1 week ago

@kumarneeraj2005 I replied on the other issue you opened.