ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.51k stars 16.08k forks source link

about training and exporting img size #13218

Open wzf19947 opened 1 month ago

wzf19947 commented 1 month ago

Search before asking

Question

I've done some experiments and find that when I train two models with img size of 640640 and 480320, and export them into ONNX with size of 480320, I find the 640640 trained model get better inference results on test sets with inference size of 480*320. Is it normal ?

Additional

No response

github-actions[bot] commented 1 month ago

👋 Hello @wzf19947, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 month ago

@wzf19947 hello,

Thank you for your question and for sharing your observations! It's great to see you experimenting with different image sizes and analyzing the results.

Yes, it is normal to observe better inference results from a model trained on larger images (e.g., 640x640) even when exporting and inferring at a smaller size (e.g., 480x320). Here are a few reasons why this might happen:

  1. Feature Extraction: Training on larger images allows the model to learn more detailed features. These features can still be beneficial when the model is later used on smaller images, as the model has a richer set of learned features to draw from.

  2. Resolution and Detail: Larger images contain more information and finer details, which can help the model learn more robust and generalizable features. When you downscale the images for inference, the model can still leverage the detailed features it learned during training.

  3. Overfitting: Training on smaller images might lead to overfitting on the specific resolution and details present in those images. Larger images provide more variability and can help the model generalize better.

To further optimize your results, you might consider the following steps:

Here's a quick example of how you might train with different image sizes using YOLOv5:

# Train with 640x640 images
python train.py --data custom.yaml --img 640 --epochs 300 --weights yolov5s.pt

# Train with 480x320 images
python train.py --data custom.yaml --img 480 --epochs 300 --weights yolov5s.pt

For more detailed tips on achieving the best training results, you can refer to our Tips for Best Training Results guide.

If you encounter any issues or have further questions, feel free to reach out. Happy training! 🚀

wzf19947 commented 1 month ago

@wzf19947 hello,

Thank you for your question and for sharing your observations! It's great to see you experimenting with different image sizes and analyzing the results.

Yes, it is normal to observe better inference results from a model trained on larger images (e.g., 640x640) even when exporting and inferring at a smaller size (e.g., 480x320). Here are a few reasons why this might happen:

  1. Feature Extraction: Training on larger images allows the model to learn more detailed features. These features can still be beneficial when the model is later used on smaller images, as the model has a richer set of learned features to draw from.
  2. Resolution and Detail: Larger images contain more information and finer details, which can help the model learn more robust and generalizable features. When you downscale the images for inference, the model can still leverage the detailed features it learned during training.
  3. Overfitting: Training on smaller images might lead to overfitting on the specific resolution and details present in those images. Larger images provide more variability and can help the model generalize better.

To further optimize your results, you might consider the following steps:

  • Experiment with Different Resolutions: Continue experimenting with different training and inference resolutions to find the optimal balance for your specific dataset and use case.
  • Data Augmentation: Use data augmentation techniques to introduce variability during training, which can help the model generalize better to different image sizes.
  • Hyperparameter Tuning: Adjust hyperparameters to see if they can improve performance when training with different image sizes.

Here's a quick example of how you might train with different image sizes using YOLOv5:

# Train with 640x640 images
python train.py --data custom.yaml --img 640 --epochs 300 --weights yolov5s.pt

# Train with 480x320 images
python train.py --data custom.yaml --img 480 --epochs 300 --weights yolov5s.pt

For more detailed tips on achieving the best training results, you can refer to our Tips for Best Training Results guide.

If you encounter any issues or have further questions, feel free to reach out. Happy training! 🚀

thank you for you reply . I have another question is : when I use imgsz=480, in this engineering it might means 480480,right? So how can I train with 480320?

glenn-jocher commented 1 month ago

@wzf19947 you're welcome! Yes, when you specify imgsz=480 in YOLOv5, it typically means the images will be resized to 480x480. If you want to train with a non-square resolution like 480x320, you can achieve this by modifying the dataset configuration and the training script.

Here's how you can do it:

  1. Modify the Dataset Configuration: Ensure your dataset configuration file (e.g., custom.yaml) is set up correctly. This file should include paths to your training and validation datasets.

  2. Custom Image Size: YOLOv5's train.py script doesn't directly support non-square image sizes through the --img argument. However, you can modify the train.py script to accept a tuple for the image size. Alternatively, you can preprocess your images to the desired size before training.

  3. Preprocess Images: Resize your images to 480x320 before training. You can use various image processing libraries like OpenCV or PIL to resize your images.

Here's an example of how you might preprocess your images using Python and OpenCV:

import cv2
import os

input_folder = 'path/to/your/images'
output_folder = 'path/to/save/resized/images'
desired_size = (480, 320)

if not os.path.exists(output_folder):
    os.makedirs(output_folder)

for filename in os.listdir(input_folder):
    if filename.endswith('.jpg') or filename.endswith('.png'):
        img = cv2.imread(os.path.join(input_folder, filename))
        resized_img = cv2.resize(img, desired_size)
        cv2.imwrite(os.path.join(output_folder, filename), resized_img)
  1. Training with Preprocessed Images: Once your images are resized, you can proceed with training as usual. Make sure your dataset configuration file points to the directory with the resized images.
python train.py --data custom.yaml --img 480 --epochs 300 --weights yolov5s.pt

By resizing your images to 480x320 before training, you ensure that the model is trained on the desired resolution. This approach allows you to maintain the aspect ratio and specific resolution you need for your application.

If you have any further questions or run into any issues, feel free to ask. Happy training! 🚀