check_dataset passes but "The dataset labels cannot be located." after upload

tlong123 commented 1 month ago

Search before asking

[X] I have searched the HUB issues and found no similar bug report.

HUB Component

Datasets

Bug

I have created a new dataset based off of an existing one I have used before. After zipping it I ran the check_dataset function against it and recieved a "checks completed correctly" response, however when I upload it to the hub all I get is "Invalid The dataset labels cannot be located."

the output I get from check_dataset is as follows:

check_dataset("seadrone_tiled_512.zip", task="detect") Starting HUB dataset checks for /Users/myuser/Projects/datasets/seadrone_tiled_512.zip.... WARNING ⚠️ Skipping /Users/myuser/Projects/datasets/seadrone_tiled_512.zip unzip as destination directory /Users/myuser/Projects/datasets/seadrone_tiled_512 is not empty. Scanning /Users/myuser/Projects/datasets/seadrone_tiled_512/labels/train... 252620 images, 212617 backgrounds, 0 corrupt: 100% WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/11182_2_6.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/11182_3_6.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/13698_0_1.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/13831_1_2.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/13832_1_2.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/6363_0_6.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/7142_1_4.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/7142_2_3.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/7571_1_3.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9560_2_4.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9561_2_4.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9562_3_4.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9562_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9563_3_4.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9563_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9564_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9565_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9566_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9567_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9568_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9569_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9570_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9571_3_5.jpg: 1 duplicate labels removed WARNING ⚠️ /Users/myuser/Projects/datasets/seadrone_tiled_512/images/train/9572_3_5.jpg: 1 duplicate labels removed New cache created: /Users/myuser/Projects/datasets/seadrone_tiled_512/labels/train.cache Statistics: 100%|██████████| 252620/252620 [00:00<00:00, 836473.43it/s] Scanning /Users/myuser/Projects/datasets/seadrone_tiled_512/labels/val... 44234 images, 37329 backgrounds, 0 corrupt: 100%|███ New cache created: /Users/myuser/Projects/datasets/seadrone_tiled_512/labels/val.cache Statistics: 100%|██████████| 44234/44234 [00:00<00:00, 818177.92it/s] Checks completed correctly ✅. Upload this dataset to https://hub.ultralytics.com/datasets/.

Environment

check_dataset ran on m1 mac pro

Minimal Reproducible Example

No response

Additional

No response

UltralyticsAssistant commented 1 month ago

@tlong123 hi there! 👋 It sounds like you've done everything correctly by running check_dataset, and it's surprising that the labels aren't recognized after the upload to the Ultralytics HUB. This issue might involve a subtle detail in the dataset structure or labeling format that's not caught by the preliminary checks.

A few things to consider:

Ensure that your dataset's directory structure strictly follows the expected format, with separate folders for images and labels within each data split (e.g., train, val).
Verify that label files match the corresponding image files in name, location, and count.
Double-check for any discrepancies or formatting errors in your label files that could have been overlooked by the dataset checker.

Since the check_dataset tool indicates successful validation but the upload process is still failing, it might also be a good idea to re-zip and re-upload your dataset, ensuring no files are missing or corrupted in the process.

If you continue to face issues, please provide more details about your dataset's structure and any steps you've taken since encountering the error. For further guidance, the Ultralytics HUB Docs at https://docs.ultralytics.com/hub might offer additional insights into dataset preparation and troubleshooting.

Keep us posted! Your feedback helps us improve the HUB for everyone. 🚀

tlong123 commented 1 month ago

my suspicion is that it's failing maybe due to some expectation on the configuration of the path variable in the yaml file, but I also expected that that would have been picked up by the check_dataset function and the hub error doesn't give me enough information to debug the problem.

UltralyticsAssistant commented 1 month ago

@tlong123 You're right that the check_dataset function is designed to catch a wide range of common issues, but it might not cover every possible configuration or edge case, such as specific expectations around the YAML file's path configurations.

Your suspicion regarding the path variable in the YAML file could be a valid point of failure. The YAML file should accurately reflect the structure and location of your dataset's directories and files. If there's a discrepancy between the paths specified in the YAML and the actual dataset layout, it could lead to the issue you're experiencing.

Without altering the structure of your dataset, double-check the YAML file to ensure all paths accurately mirror your dataset's organization. This includes ensuring that the train, val, and if applicable, test paths are correctly specified and point to the right directories.

If the problem persists, updating the YAML to reflect the exact structure of your dataset and re-uploading might help. If errors continue or the messaging remains unclear, we appreciate this feedback as we always aim to enhance the user experience and debugging capabilities on the HUB.

Thanks for sticking with us 🛠️, and please keep us posted on your progress!

tlong123 commented 1 month ago

@sergiuwaxmann any ideas how I could fix this?

sergiuwaxmann commented 1 month ago

Hello @tlong123! In theory, the check_dataset function should be enough but maybe you can take a look at our docs about uploading a dataset or at the Upload Dataset dialog (in Ultralytics HUB) where you can see an example of a dataset structure (based on the selected task) - see image below. example_dataset

github-actions[bot] commented 2 days ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

ultralytics / hub