NEW Classification Datasets: `imagenet10`, `imagenet100`, `imagenet1000`

glenn-jocher commented 1 year ago

All, FYI I've created 3 new classification datasets for use with debugging/tests/benchmarks: imagenet10, imagenet100, imagenet1000.

These are super small versions of imagenet that train/val in seconds with only 1 image per class (with all 1000 classes, only 100 classes, and only 10 classes). They are reduced in size to imgsz=160 and compressed with PIL so that i.e. imagenet10 is only 70kB, imagenet1000 is only 7MB.

CI

I've migrated our YOLOv8 CI, tests and benchmarks to use them, i.e. benchmarks are now on imagenet100 (top5 accuracy is 0.71 for YOLOv8n-cls on imagenet100). https://github.com/ultralytics/ultralytics/actions/runs/4223103056/jobs/7332424420

Download

You can download them here: https://github.com/ultralytics/yolov5/releases/download/v1.0/imagenet10.zip https://github.com/ultralytics/yolov5/releases/download/v1.0/imagenet100.zip https://github.com/ultralytics/yolov5/releases/download/v1.0/imagenet1000.zip

Usage

pip install ultralytics

yolo train model=yolov8n-cls.pt data=imagenet100

or

from ultralytics import YOLO

model = YOLO('yolov8n-cls.pt')
results = model.train(data='imagenet100', imgsz=160)

nouranali commented 1 year ago

@glenn-jocher I want to work on this issue where can I start?

glenn-jocher commented 1 year ago

@nouranali great to hear that you are interested in working on this issue! To get started, I recommend checking out the codebase of the YOLOv5 repository. Familiarize yourself with the relevant files and modules related to the issue. You can also review any existing discussions or proposed solutions pertaining to the problem.

If you have any specific questions or need guidance on how to approach the issue, feel free to ask. We're here to help you throughout the process. Happy coding!

nouranali commented 1 year ago

@nouranali great to hear that you are interested in working on this issue! To get started, I recommend checking out the codebase of the YOLOv5 repository. Familiarize yourself with the relevant files and modules related to the issue. You can also review any existing discussions or proposed solutions pertaining to the problem.

If you have any specific questions or need guidance on how to approach the issue, feel free to ask. We're here to help you throughout the process. Happy coding!

I want to know what modules should I edit in order to work on this issue

nouranali commented 1 year ago

I figured out that I need to add scripts for the 3 datasets here https://github.com/ultralytics/yolov5/tree/master/data/scripts and add 3 .yaml files for the datasets here https://github.com/ultralytics/yolov5/tree/master/data , can you verify? @glenn-jocher

glenn-jocher commented 1 year ago

That's a good start, @nouranali! Yes, you're on the right track. To add the three new datasets (imagenet10, imagenet100, imagenet1000), you'll need to create the necessary scripts for each dataset in the data/scripts directory. These scripts should handle the data preparation and formatting, such as downloading the images, creating the train/validation splits, and generating the annotation files.

Additionally, you'll need to add three .yaml files for the datasets in the data directory. These .yaml files should specify the dataset's name, image size (imgsz), number of classes, and the paths to the data and annotation files.

Once you've created the necessary scripts and .yaml files, you can then use them as inputs when training or evaluating the models.

If you have any further questions or need assistance, feel free to ask. Good luck with your contribution!

nouranali commented 1 year ago

can you check my PR https://github.com/ultralytics/yolov5/pull/12141 ? @glenn-jocher

glenn-jocher commented 1 year ago

@nouranali sure, here's a friendly and professional reply to the GitHub issue:

Thank you for submitting your pull request! I appreciate your contribution to the YOLOv5 repository. I will make sure to review your PR as soon as possible and provide you with feedback. Please be patient while I go through it. Feel free to reach out if you have any questions or need further assistance.

Thanks again for your support!

Glenn Jocher

ultralytics / yolov5

NEW Classification Datasets: `imagenet10`, `imagenet100`, `imagenet1000` #11028

CI

Download

Usage