ultralytics / ultralytics

Ultralytics YOLO11 šŸš€
https://docs.ultralytics.com
GNU Affero General Public License v3.0
36.58k stars 7.05k forks source link

Pytorch WebDataset Dataloader #19072

Open AndrewNoviello opened 1 week ago

AndrewNoviello commented 1 week ago

Search before asking

Question

Hi! I have been using YOLO (primarily YOLOv5) on various personal projects for years (since 2021) and have continuously been impressed by your work. I am currently seeking to train YOLOv11 on AWS Sagemaker - my dataset is quite small (only about 12GB) but consists of ~200,000 files (100k images, 100k labels). As a result of this imbalance between size and # of files, it is quite slow to download the images onto the Sagemaker EBS. Instead, I am trying to use Pytorch's webdataset to just download the tar files directly from S3. As such, I was wondering, does YOLOv11 support a webdataset dataloader? If not, how could I go about adapting the existing dataloader to do so?

Thanks so much! Sorry if this is a nonsensical question - I am somewhat new to Sagemaker and to training YOLO with such large numbers of images (the most I have used in the past was 25K in which I was able to wait for it to download onto the EBS instance).

Additional

No response

UltralyticsAssistant commented 1 week ago

šŸ‘‹ Hello @AndrewNoviello, thank you for your interest in Ultralytics šŸš€! We recommend checking out our Docs for comprehensive guides and resources to assist you, including Python and CLI usage examples.

Your question is insightful, and we'd love to assist! However, before diving in, if this ā“ is a custom question about implementing a custom dataloader, could you provide more specific steps or code examples of what you've already tried? If instead this is a šŸ› Bug Report related to the current dataloader, please provide a minimum reproducible example so we can better assist you.

Suggestions

  1. For reference on modifying or customizing the dataloader, explore our Model Training Tips and ensure your workflow aligns with the recommended practices.
  2. If adapting the dataloader, a good starting point would be reviewing the current YOLO dataloader source in the repository to analyze its structure.
  3. Upgrading to the latest version might help ensure compatibility if you're encountering issues. You can upgrade using:
pip install -U ultralytics

Community Support

We encourage you to join the Ultralytics community to discuss your use case and potentially related solutions:

Testing Environments

YOLO runs in a variety of environments. If you want to explore alternatives or experiment in an isolated setup, you can try one of our validated environments:

Status

To ensure your issue is not due to an environment issue, you can verify our CI tests below. If the badge is green, all Ultralytics CI tests are passing:

Ultralytics CI

This is an automated response to guide you with initial steps and resources to expedite your inquiry. An Ultralytics engineer will review your issue and assist you further as soon as possible šŸ™‚

Y-T-G commented 1 week ago

It doesn't support it

The Dataset class is defined at https://github.com/ultralytics/ultralytics/blob/c5ac5548aeac8ed1554ef6224bd3d839974abc5b/ultralytics/data/dataset.py