ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.89k stars 16.39k forks source link

Quantization and pruning to my pre-trained model #13077

Closed hsaine closed 3 months ago

hsaine commented 5 months ago

Search before asking

Question

Hello,

I want to apply Quantization and pruning to my pre-trained yolov5 model. I specifically want to use post-training quantization (PTQ) and unstructured pruning. Could you provide me with the steps and a tutorial on how to do this?

Thank you.

Additional

No response

glenn-jocher commented 5 months ago

Hello,

Thank you for reaching out! It's great to hear that you're interested in applying quantization and pruning to your pre-trained YOLOv5 model. Let's walk through the steps for both post-training quantization (PTQ) and unstructured pruning.

Pruning

First, let's start with unstructured pruning. Pruning helps in reducing the model size and potentially increasing inference speed by setting a percentage of the model's weights to zero. Here's a concise guide to get you started:

  1. Clone the YOLOv5 repository and install the required dependencies:

    git clone https://github.com/ultralytics/yolov5
    cd yolov5
    pip install -r requirements.txt
  2. Test your model to establish a baseline performance:

    python val.py --weights your_model.pt --data coco.yaml --img 640 --half
  3. Apply pruning to your model: Update val.py to include the pruning step. For example, to prune your model to 30% sparsity:

    from utils.torch_utils import prune
    
    # Load your model
    model = torch.load('your_model.pt')['model'].float()
    
    # Apply pruning
    prune(model, amount=0.3)  # 30% sparsity
    
    # Save the pruned model
    torch.save(model, 'your_pruned_model.pt')
  4. Evaluate the pruned model:

    python val.py --weights your_pruned_model.pt --data coco.yaml --img 640 --half

For more detailed information, you can refer to our Model Pruning and Sparsity Tutorial.

Quantization

For post-training quantization (PTQ), you can use PyTorch's built-in quantization tools. Here’s a basic example:

  1. Prepare your model for quantization:

    import torch
    from torch.quantization import quantize_dynamic
    
    # Load your model
    model = torch.load('your_model.pt')['model'].float()
    
    # Apply dynamic quantization
    quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
    
    # Save the quantized model
    torch.save(quantized_model, 'your_quantized_model.pt')
  2. Evaluate the quantized model:

    python val.py --weights your_quantized_model.pt --data coco.yaml --img 640 --half

Additional Resources

For a more comprehensive guide on quantization, you can refer to the PyTorch Quantization Documentation.

If you encounter any issues or have further questions, please ensure you provide a minimum reproducible example as outlined here. This will help us assist you more effectively.

Happy coding! 😊

hsaine commented 5 months ago

Thank you very much. I have tested unstructured pruning and now I want to apply structured pruning to compare both methods. How can I do it? Thanks.

glenn-jocher commented 5 months ago

Hello,

Thank you for your interest in exploring structured pruning! It's great to hear that you've successfully tested unstructured pruning. Structured pruning can further help in reducing the model size and potentially improving inference speed by removing entire channels or filters from the model.

Here's a step-by-step guide to apply structured pruning to your YOLOv5 model:

Structured Pruning

  1. Clone the YOLOv5 repository and install the required dependencies (if you haven't already):

    git clone https://github.com/ultralytics/yolov5
    cd yolov5
    pip install -r requirements.txt
  2. Test your model to establish a baseline performance (if not done already):

    python val.py --weights your_model.pt --data coco.yaml --img 640 --half
  3. Apply structured pruning to your model: Structured pruning typically involves removing entire filters or channels. Here's an example using PyTorch's pruning methods:

    import torch
    import torch.nn.utils.prune as prune
    
    # Load your model
    model = torch.load('your_model.pt')['model'].float()
    
    # Define a function to apply structured pruning
    def apply_structured_pruning(model, amount=0.3):
        for name, module in model.named_modules():
            if isinstance(module, torch.nn.Conv2d):
                prune.ln_structured(module, name='weight', amount=amount, n=2, dim=0)
                prune.remove(module, 'weight')
        return model
    
    # Apply structured pruning
    pruned_model = apply_structured_pruning(model, amount=0.3)  # 30% structured pruning
    
    # Save the pruned model
    torch.save(pruned_model, 'your_structured_pruned_model.pt')
  4. Evaluate the structured pruned model:

    python val.py --weights your_structured_pruned_model.pt --data coco.yaml --img 640 --half

Additional Considerations

If you encounter any issues or have further questions, please ensure you provide a minimum reproducible example as outlined here. This will help us assist you more effectively.

Happy experimenting! 😊

hsaine commented 5 months ago

Hello, when I try to validate the structured pruning, it shows me this error. For the path, I have changed it to a placeholder name for confidentiality. I have replaced the actual path with "path." How can I solve this problem in order to visualize the performance of the model? and thanks.

data=path/Bureau/yolov5/data/data2aug.yaml, weights=['path/Bureau/yolov5/runs/pruning/pruned_model_nano_structured.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False YOLOv5 πŸš€ v7.0-317-gc1803846 Python-3.10.12 torch-2.0.1+cu117 CUDA:0 (NVIDIA GeForce RTX 2070, 7972MiB)

Traceback (most recent call last): File "path/Bureau/yolov5/val.py", line 438, in main(opt) File "path/Bureau/yolov5/val.py", line 409, in main run(*vars(opt)) File "/path/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "path/Bureau/yolov5/val.py", line 165, in run model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half) File "path/yolov5/models/common.py", line 467, in init model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse) File "path/Bureau/yolov5/models/experimental.py", line 99, in attempt_load ckpt = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model AttributeError: 'collections.OrderedDict' object has no attribute 'to' File "path/Bureau/yolov5/models/common.py", line 467, in init model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse) File "path/Bureau/yolov5/models/experimental.py", line 99, in attempt_load ckpt = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model AttributeError: 'collections.OrderedDict' object has no attribute 'to'

glenn-jocher commented 5 months ago

Hello,

Thank you for reaching out and providing detailed information about the issue you're encountering. It looks like you're facing an AttributeError related to the model loading process during validation after applying structured pruning.

To assist you effectively, could you please provide a minimum reproducible example of your code? This will help us better understand the context and reproduce the issue on our end. You can refer to our Minimum Reproducible Example Guide for more details on how to create one. This step is crucial for us to investigate and resolve the problem efficiently.

In the meantime, please ensure that you are using the latest versions of torch and the YOLOv5 repository. You can update your packages with the following commands:

pip install --upgrade torch
git pull

From the error message, it seems that the model checkpoint might not be loaded correctly. The attempt_load function expects the checkpoint to have either an "ema" or "model" key, which should be a model object that can be moved to the device. Here’s a potential fix you can try:

  1. Check the structure of your checkpoint file: Ensure that your checkpoint file contains the correct keys. You can inspect the checkpoint file as follows:

    import torch
    
    checkpoint = torch.load('path/Bureau/yolov5/runs/pruning/pruned_model_nano_structured.pt')
    print(checkpoint.keys())

    The output should include either "ema" or "model". If not, you might need to adjust how the model is saved.

  2. Modify the checkpoint loading process: If the checkpoint structure is different, you can modify the loading process to handle it appropriately. For example:

    from yolov5.models.common import DetectMultiBackend
    
    # Load the model
    weights = 'path/Bureau/yolov5/runs/pruning/pruned_model_nano_structured.pt'
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    checkpoint = torch.load(weights, map_location=device)
    
    # Check and load the model
    model = checkpoint.get('ema') or checkpoint.get('model')
    if model is None:
        raise ValueError("Checkpoint does not contain 'ema' or 'model' keys.")
    model = model.to(device).float()
    
    # Continue with validation
    detect_backend = DetectMultiBackend(weights, device=device, dnn=False, data='path/Bureau/yolov5/data/data2aug.yaml', fp16=False)

Please try these steps and let us know if the issue persists. Providing the minimum reproducible example will greatly help us in diagnosing the problem further.

Thank you for your cooperation, and we look forward to helping you resolve this issue!

github-actions[bot] commented 4 months ago

πŸ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO πŸš€ and Vision AI ⭐

tobymuller233 commented 2 days ago

Hello,

Thank you for reaching out! It's great to hear that you're interested in applying quantization and pruning to your pre-trained YOLOv5 model. Let's walk through the steps for both post-training quantization (PTQ) and unstructured pruning.

Pruning

First, let's start with unstructured pruning. Pruning helps in reducing the model size and potentially increasing inference speed by setting a percentage of the model's weights to zero. Here's a concise guide to get you started:

  1. Clone the YOLOv5 repository and install the required dependencies:
    git clone https://github.com/ultralytics/yolov5
    cd yolov5
    pip install -r requirements.txt
  2. Test your model to establish a baseline performance:
    python val.py --weights your_model.pt --data coco.yaml --img 640 --half
  3. Apply pruning to your model: Update val.py to include the pruning step. For example, to prune your model to 30% sparsity:

    from utils.torch_utils import prune
    
    # Load your model
    model = torch.load('your_model.pt')['model'].float()
    
    # Apply pruning
    prune(model, amount=0.3)  # 30% sparsity
    
    # Save the pruned model
    torch.save(model, 'your_pruned_model.pt')
  4. Evaluate the pruned model:
    python val.py --weights your_pruned_model.pt --data coco.yaml --img 640 --half

For more detailed information, you can refer to our Model Pruning and Sparsity Tutorial.

Quantization

For post-training quantization (PTQ), you can use PyTorch's built-in quantization tools. Here’s a basic example:

  1. Prepare your model for quantization:

    import torch
    from torch.quantization import quantize_dynamic
    
    # Load your model
    model = torch.load('your_model.pt')['model'].float()
    
    # Apply dynamic quantization
    quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
    
    # Save the quantized model
    torch.save(quantized_model, 'your_quantized_model.pt')
  2. Evaluate the quantized model:
    python val.py --weights your_quantized_model.pt --data coco.yaml --img 640 --half

Additional Resources

For a more comprehensive guide on quantization, you can refer to the PyTorch Quantization Documentation.

If you encounter any issues or have further questions, please ensure you provide a minimum reproducible example as outlined here. This will help us assist you more effectively.

Happy coding! 😊

Hi! Thanks for your reply on pruning in this framework. When I tried to prune my own model using val, I get the pruned model successfully, however, when I try to fine tune the model by training it for several extra epochs. It turns out that the yaml component of the model, called by this piece of codes in train.py, hasn't changed after pruning.

model = Model(cfg or ckpt["model"].yaml, ch=3, nc=nc, anchors=hyp.get("anchors")).to(device)  # create

This lead to a wrong model structure printed by the program. The following picture is a screenshot of part of the pruned model. Hopefully, the pruned model shown in the train procedure should show the same channel.

Screenshot 2024-11-14 at 12 35 08

However, when I run train.py, the model structure has not changed which is:

Screenshot 2024-11-14 at 12 37 23

It seems that the prune method you provide in the repo doesn't modify the yaml component of the model. And the input channel of layer 39 is still 24, not 16.

pderrenger commented 1 day ago

Thank you for the detailed explanation of the issue. You've identified an important limitation with the current pruning implementation. The prune() function modifies the weights but doesn't automatically update the model architecture configuration in the YAML file.

To properly fine-tune a pruned model, you'll need to:

  1. Save both the pruned model state and the modified architecture:
    
    from utils.torch_utils import prune

Load your model

model = torch.load('your_model.pt')['model'].float()

Apply pruning

prune(model, amount=0.3) # 30% sparsity

Save the complete state

torch.save({ 'model': model, 'model_yaml': model.yaml }, 'pruned_model.pt')


2. When loading for training, use the saved model directly instead of recreating from YAML:
```python
ckpt = torch.load('pruned_model.pt')
model = ckpt['model']  # Load the pruned model directly

This is a known limitation in the current implementation. For a more robust solution that properly handles architecture changes, you may want to consider using structured pruning instead, which is better suited for maintaining architectural consistency.

If you continue to experience issues, please open a GitHub issue with a minimal reproducible example and we can investigate further.

tobymuller233 commented 1 day ago

Thank you for the detailed explanation of the issue. You've identified an important limitation with the current pruning implementation. The prune() function modifies the weights but doesn't automatically update the model architecture configuration in the YAML file.

To properly fine-tune a pruned model, you'll need to:

  1. Save both the pruned model state and the modified architecture:
from utils.torch_utils import prune

# Load your model
model = torch.load('your_model.pt')['model'].float()

# Apply pruning
prune(model, amount=0.3)  # 30% sparsity

# Save the complete state
torch.save({
    'model': model,
    'model_yaml': model.yaml
}, 'pruned_model.pt')
  1. When loading for training, use the saved model directly instead of recreating from YAML:
ckpt = torch.load('pruned_model.pt')
model = ckpt['model']  # Load the pruned model directly

This is a known limitation in the current implementation. For a more robust solution that properly handles architecture changes, you may want to consider using structured pruning instead, which is better suited for maintaining architectural consistency.

If you continue to experience issues, please open a GitHub issue with a minimal reproducible example and we can investigate further.

Thanks for your reply! After trying according to your guidance, I found that model.yaml has not changed even after running prune():

Screenshot 2024-11-15 at 11 48 56

This image shows the last 3 layers of the model in the debug console. I set a breakpoint to check out what's going on here.

Screenshot 2024-11-15 at 11 49 58

It seems that after pruning the channels in these 3 layers have changed apparently, while yaml component has not changed.

pderrenger commented 1 day ago

Thank you for providing those detailed screenshots. You're correct - this is because the current unstructured pruning implementation in YOLOv5 only zeroes out weights without modifying the underlying architecture or YAML configuration. This is an inherent limitation of unstructured pruning.

For architecture-aware model compression, I recommend using structured pruning instead. Structured pruning removes entire channels/filters, naturally leading to architecture changes that can be reflected in the model configuration. You can find an example of structured pruning implementation in my previous response.

If you need to maintain architectural consistency for your use case, please open a feature request issue on the YOLOv5 repository with your specific requirements, and we can explore implementing better support for pruning-aware architecture updates.

tobymuller233 commented 1 day ago

@pderrenger Thanks! I finally modified the yaml file by myself. Additionally, I used torch-pruning to prune my own model, which contains a module Bottleneck3 that is implemented by myself. This module has nothing different with Bottleneck except that there are three convolution layers in this module. Screenshot 2024-11-15 at 17 35 19

class Bottleneck3(nn.Module):
    """Implements a bottleneck layer with optional shortcut for efficient feature extraction in neural networks."""

    def __init__(self, c1, c2, mid_layer=None, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion
        """Initializes a standard bottleneck layer with optional shortcut; args: input channels (c1), output channels
        (c2), shortcut (bool), groups (g), expansion factor (e).
        """
        super().__init__()
        if not mid_layer is None:
            c_ = int(mid_layer)
        else:
            c_ = int(c2 * 6)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = DWConv(c_, c_, 3, 1)
        self.cv3 = Conv(c_, c2, 1, 1)

        self.add = shortcut and c1 == c2

    def forward(self, x):
        """Executes forward pass, performing convolutional ops and optional shortcut addition; expects input tensor
        x.
        """
        return x + self.cv3(self.cv2(self.cv1(x))) if self.add else self.cv3(self.cv2(self.cv1(x)))

After pruning, the submodules like Conv, DWConv are pruned seperately. In my original yaml file, if a Bottleneck3 is repeated twice or more, I would just add a parameter n for n times repeat in yaml format like this: [-1, 2, Bottleneck3, [40]],, which means Bottleneck3 module repeats twice. However, since modules are pruned separately, two successive Bottleneck3s turn out to have different structures like this: Screenshot 2024-11-15 at 17 38 45 where one of them has 26 channels in the middle layer and the other got 34. This makes me rewrite the yaml file, rewrite one line to two lines as:

# before
[-1, 2, Bottleneck3, [8]], # 9
# after
[-1, 1, Bottleneck3, [7, 26]], # 9
[-1, 1, Bottleneck3, [7, 34]], # 10

This really pissed me off at first 🀣 However, I found that this is not too difficult since I got a really small model which is only about 500k. Anyway, thanks gratefully for you quick and kind reply! Best wishes! πŸ₯³

pderrenger commented 16 hours ago

Thank you for sharing your detailed experience with model pruning and the custom Bottleneck3 implementation. It's great to see you found a solution by modifying the YAML configuration to accommodate the different channel dimensions after pruning. Your approach of splitting the repeated layers into individual configurations is a valid solution when dealing with pruned architectures that have varying channel sizes.

For others who might encounter similar situations when pruning custom YOLOv5 architectures, I recommend documenting the post-pruning channel dimensions and updating the model configuration accordingly, as you've demonstrated. This is particularly important when working with structured pruning methods that modify the network architecture.

If you need to apply similar modifications to larger models in the future, you might want to consider automating the YAML generation process based on the pruned model structure.