ultralytics / yolov5

YOLOv5 šŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.54k stars 16.3k forks source link

Consistent and Efficient Preprocessing for Classification and Detection Model #9192

Closed paradigmn closed 2 years ago

paradigmn commented 2 years ago

Search before asking

Description

The new classification model utilizes a different preprocessing pipeline as the detection model. For object detection, the image is normalized, resized with constant aspect ratio and padded to size. In comparison, the new classification models utilizes normalization, resizing, center crop and standardization by ImageNet mean and std.

Use case

We use YOLOv5 models for embedded applications with our own performance optimized software framework. Due to the two pipelines, the models are not easily interchangeable. Furthermore, the classification preprocessing pipeline is more performance intensive and therefore not well suited for low power environments.

Would it be possible to introduce a compatibility flag or some other solution to export classification models which expect the same normalized rgb image input?

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 2 years ago

šŸ‘‹ Hello @paradigmn, thank you for your interest in YOLOv5 šŸš€! Please visit our ā­ļø Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a šŸ› Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ā“ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@paradigmn that's a good question, we've had another source inquire as well. This first ClassficiationModel release is simply intended as a starting point for future releases, so this kind of feedback is valuable.

The current preprocessing was meant to mirror the typical standard like ResNet, but I agree we should work on harmonizing the two.

The good news is our next release, v6.3 in September supports instance SegmentationModels using the same preprocessing as the DetectionModels, so at least those two are equal.

WRT to ImageNet normalization, I have not observed it to produce any improvements in YOLOv5 which is why I left it out. I doubt it's helping Classification either, the high number of BN modules are already doing this at every layer.

We'll experiment with similar preprocessing in the future. @AyushExel

glenn-jocher commented 2 years ago

@paradigmn opened https://github.com/ultralytics/yolov5/pull/9213 to experiment with detection-like preprocessing for cls.

paradigmn commented 2 years ago

@glenn-jocher cool, thanks for the fast response! I am excited about the new changes :)

glenn-jocher commented 2 years ago

@paradigmn have you guys done any studies on the effects of different pre-processing techniques or is this just something you noticed recently?

paradigmn commented 2 years ago

@glenn-jocher no, we just tried to use the classification model with our existing pipeline and noticed a steep decline in accuracy. With your LetterBox preprocessing from #9213 everything works as expected, good work!

glenn-jocher commented 2 years ago

@paradigmn ok I've merged #9213 into master. This replaces the Torch transforms with faster ones I wrote myself, should be about 3X faster preprocessing. It's still using the same series of operations, but this will give us a good baseline to start testing the effect of using detection preprocessing on the classification models.

paradigmn commented 2 years ago

@glenn-jocher sounds good. Did you update the pretrained classification model weights as well?

glenn-jocher commented 2 years ago

@paradigmn no, no. Retraining the models is a significant undertaking. The current PR replaces the existing transforms with faster ones that do the same thing, and adds a LetterBox transform that is currently unused but that we can experiment with.

paradigmn commented 2 years ago

ah ok, thank you for clarifying.

glenn-jocher commented 11 months ago

@paradigmn you're welcome! Let me know if you have any more questions.