openvinotoolkit / nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference
Apache License 2.0
905 stars 227 forks source link

nncf + ultralytics yolov8 training-time compression #2486

Closed SofyaLL closed 6 months ago

SofyaLL commented 6 months ago

Hello! I want to use NNCF to quantize yolov8 model using ultralytics. Is it possible to use nncf and ultralytics to training-time compression?

I've got an error when use create_compressed_model.

TypeError: cat() received an invalid combination of arguments - got (TracedTensor, int), but expected one of:
 * (tuple of Tensors tensors, int dim, *, Tensor out)
 * (tuple of Tensors tensors, name dim, *, Tensor out)

Versions: nncf: 2.8.1 torch: 2.1.2 torchvision: 0.16.2 ultralytics: 8.1.11

Thank you!

My code

from typing import Tuple, Dict, Any

from ultralytics.nn.tasks import DetectionModel
from ultralytics import YOLO
from ultralytics.data.build import build_dataloader
from ultralytics.cfg import get_cfg
from ultralytics.utils import DEFAULT_CFG
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import LOGGER, RANK

from nncf.torch.initialization import PTInitializingDataLoader
from nncf import NNCFConfig
from nncf.torch import create_compressed_model, register_default_init_args

class MyInitializingDataLoader(PTInitializingDataLoader):
    def get_inputs(self, dataloader_output: Any) -> Tuple[Tuple, Dict]:
        # your implementation - `dataloader_output` is what is returned by your dataloader,
        # and you have to turn it into a (args, kwargs) tuple that is required by your model
        # in this function, for instance, if your dataloader returns dictionaries where
        # the input image is under key `"img"`, and your YOLOv8 model accepts the input
        # images as 0-th `forward` positional arg, you would do:
        return dataloader_output["img"], {}

    def get_target(self, dataloader_output: Any) -> Any:
        # and in this function you should extract the "ground truth" value from your
        # dataloader, so, for instance, if your dataloader output is a dictionary where
        # ground truth images are under a "gt" key, then here you would write:
        return dataloader_output["gt"]

class MyCustomModel(DetectionModel):
    def __init__(self, nncf_config_dict, dataloader, cfg="yolov8n.yaml", ch=3, nc=None, verbose=True):
        super().__init__(cfg, ch, nc, verbose)

        nncf_config = NNCFConfig.from_dict(nncf_config_dict)
        nncf_dataloader = MyInitializingDataLoader(dataloader)
        nncf_config = register_default_init_args(nncf_config, nncf_dataloader)
        self.compression_ctrl, self.model = create_compressed_model(self.model, nncf_config)

class MyTrainer(DetectionTrainer):
    def __init__(self, dataloader, nncf_config_dict, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
        super().__init__(cfg, overrides, _callbacks)
        nncf_config = NNCFConfig.from_dict(nncf_config_dict)
        self.nncf_dataloader = MyInitializingDataLoader(dataloader)
        self.nncf_config = register_default_init_args(nncf_config, self.nncf_dataloader)

    def get_model(self, cfg=None, weights=None, verbose=True):
        """Return a YOLO detection model."""
        model = DetectionModel(cfg, nc=self.data["nc"], verbose=verbose and RANK == -1)
        if weights:
            model.load(weights)
        self.compression_ctrl, model.model = create_compressed_model(model.model, self.nncf_config)
        return model

def main():
    args = dict(model='yolov8n.pt', data='coco8.yaml', epochs=3, mode='train', verbose=False)
    trainer = DetectionTrainer(overrides=args)
    trainer._setup_train(world_size=0)
    train_loader = trainer.train_loader

    nncf_config_dict = {
        "input_info": {
            "sample_size": [1, 3, 640, 640]
        },
        "log_dir": 'yolov8_output',  # The log directory for NNCF-specific logging outputs.
        "compression": {
            "algorithm": "quantization"  # Specify the algorithm here.
        },
    }

    nncf_trainer = MyTrainer(train_loader, nncf_config_dict, overrides=args)
    nncf_trainer.train()

if __name__ == '__main__':
    main()
MaximProshin commented 6 months ago

@alexsu52 @daniil-lyakhov @AlexanderDokuchaev please take a look

alexsu52 commented 6 months ago

Hello @SofyaLL,

Thank you for opening the issue.

NNCF QAT is based on control flow graph tracing by PyTorch model (see https://github.com/openvinotoolkit/nncf/blob/develop/docs/NNCFArchitecture.md#model-control-flow-graph-tracing). It's important to feed a model to create_compressed_model whose forward will be called during training. Your original error message is because you were trying to quantize model.model, which is not called during model inference. You should to feed model to create_compressed_model insted of model.model.

The provided code has several integration problems related to limitations and specifics of ultralytics and nncf. Below is the code and comments that avoid these restrictions:

Use ultralytics from https://github.com/ultralytics/ultralytics/pull/8318

from copy import deepcopy
from datetime import datetime
from typing import Any, Dict, Tuple

import torch
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.utils import DEFAULT_CFG
from ultralytics.utils import LOGGER
from ultralytics.utils import RANK
from ultralytics.utils import __version__
from ultralytics.utils.torch_utils import de_parallel
from ultralytics.utils.torch_utils import strip_optimizer

import nncf
from nncf import NNCFConfig
from nncf.torch import create_compressed_model
from nncf.torch import register_default_init_args
from nncf.torch.dynamic_graph.io_handling import nncf_model_input
from nncf.torch.initialization import PTInitializingDataLoader
from nncf.torch.model_creation import is_wrapped_model

# 1 integration issue:
# MyInitializingDataLoader must support deep copy because DetectionTrainer does a deep copy
# of the model and MyInitializingDataLoader during training setup. The input data_loader
# of ultralytics.data.build.InfiniteDataLoader type does not support deep copy and
# can not be used directly into MyInitializingDataLoader. The workaround for this limitation is
# to create a deepcopable dataset from the data_loader.
class MyInitializingDataLoader(PTInitializingDataLoader):
    def __init__(self, data_loader, preprocess_batch_fn, num_samples=300):
        super().__init__(data_loader)
        self._batch_size = self._data_loader.batch_size
        # Using list of images instead of 'ultralytics.data.build.InfiniteDataLoader' to support deepcopy.
        self._data_loader = []
        num_samples = num_samples / self._batch_size
        for count, data_item in enumerate(data_loader):
            if count > num_samples:
                break
            batch = preprocess_batch_fn(data_item)
            self._data_loader.append((batch["img"], None))

    @property
    def batch_size(self):
        return self._batch_size

    def get_inputs(self, dataloader_output: Any) -> Tuple[Tuple, Dict]:
        # your implementation - `dataloader_output` is what is returned by your dataloader,
        # and you have to turn it into a (args, kwargs) tuple that is required by your model
        # in this function, for instance, if your dataloader returns dictionaries where
        # the input image is under key `"img"`, and your YOLOv8 model accepts the input
        # images as 0-th `forward` positional arg, you would do:
        return (dataloader_output[0],), {}

    def get_target(self, dataloader_output: Any) -> Any:
        # and in this function you should extract the "ground truth" value from your
        # dataloader, so, for instance, if your dataloader output is a dictionary where
        # ground truth images are under a "gt" key, then here you would write:
        return dataloader_output[1]

class MyTrainer(DetectionTrainer):
    def __init__(self, nncf_config_dict, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
        super().__init__(cfg, overrides, _callbacks)
        self.nncf_config = NNCFConfig.from_dict(nncf_config_dict)
        self.nncf_dataloader = None

    def setup_model(self):
        ckpt = super().setup_model()

        if not is_wrapped_model(self.model):
            # Make copy of model to support `DetectionTrainer` save/load logic
            self.original_model = deepcopy(self.model)
            if ckpt.get("model_compression_state"):
                self.resume_model_for_qat(ckpt)
            else:
                self.prepare_model_for_qat()
        return ckpt

    def _setup_train(self, world_size):
        super()._setup_train(world_size)
        # Disable EMA for QAT. Using EMA may reduce the accuracy of the model during training.
        if self.ema:
            self.ema.enabled = False

    def get_nncf_dataloader(self):
        if self.nncf_dataloader is None:
            num_samples = self.nncf_config["compression"]["initializer"]["range"]["num_init_samples"]
            train_loader = self.get_dataloader(self.trainset, batch_size=1, rank=RANK, mode="train")
            self.nncf_dataloader = MyInitializingDataLoader(train_loader, self.preprocess_batch, num_samples)
        return self.nncf_dataloader

    def create_wrap_inputs_fn(self):
        # 2 integration issue:
        # NNCF requires the same structure of inputs in the forward function during model training
        # for correct model tracing, but the DetectionModel forward function support image tensor
        # or dict as input:
        # def forward(self, x, *args, **kwargs):
        #     if isinstance(x, dict):  # for cases of training and validating while training.
        #         return self.loss(x, *args, **kwargs)
        #     return self.predict(x, *args, **kwargs)
        # In this case, wrap_inputs_fn should be implemented to specify the "original" model input
        def wrap_inputs_fn(args, kwargs):
            if isinstance(args[0], dict):
                return args, kwargs
            args = (nncf_model_input(args[0]),) + args[1:]
            return args, kwargs

        return wrap_inputs_fn

    def prepare_model_for_qat(self):
        nncf_dataloader = self.get_nncf_dataloader()
        self.nncf_config = register_default_init_args(self.nncf_config, nncf_dataloader)

        self.model = self.model.to(self.device)
        _, self.model = create_compressed_model(
            self.model, self.nncf_config, wrap_inputs_fn=self.create_wrap_inputs_fn()
        )

    def resume_model_for_qat(self, ckpt):
        # 3 integration issue:
        # resume QAT model from the model_compression_state
        _, self.model = create_compressed_model(
            self.model,
            self.nncf_config,
            compression_state=ckpt["model_compression_state"],
            wrap_inputs_fn=self.create_wrap_inputs_fn(),
        )
        self.model.load_state_dict(ckpt["model_state_dict"])

    def save_qat_model(self):
        # 4 integration issue:
        # NNCF QAT model is not picklable. Use state dict instead of model pickling.
        import pandas as pd  # scope for faster startup

        metrics = {**self.metrics, **{"fitness": self.fitness}}
        results = {k.strip(): v for k, v in pd.read_csv(self.csv).to_dict(orient="list").items()}

        compression_controller = self.model.nncf.compression_controller
        model_compression_state = {}
        if compression_controller is not None:
            model_compression_state = compression_controller.get_compression_state()

        ckpt = {
            "epoch": self.epoch,
            "best_fitness": self.best_fitness,
            "model": deepcopy(de_parallel(self.original_model)).half(),
            "model_state_dict": de_parallel(self.model).state_dict(),
            "model_compression_state": model_compression_state,
            "optimizer": self.optimizer.state_dict(),
            "train_args": vars(self.args),  # save as dict
            "train_metrics": metrics,
            "train_results": results,
            "date": datetime.now().isoformat(),
            "version": __version__,
        }

        # Save last and best
        torch.save(ckpt, self.last)
        if self.best_fitness == self.fitness:
            torch.save(ckpt, self.best)
        if (self.save_period > 0) and (self.epoch > 0) and (self.epoch % self.save_period == 0):
            torch.save(ckpt, self.wdir / f"epoch{self.epoch}.pt")
        del ckpt

    def final_eval(self):
        """Performs final evaluation and validation for object detection YOLO model."""
        for f in self.last, self.best:
            if f.exists():
                strip_optimizer(f)  # strip optimizers
                if f is self.best:
                    LOGGER.info(f"\nValidating {f}...")
                    self.model = f
                    self.setup_model()
                    self.validator.args.plots = self.args.plots
                    self.metrics = self.validator(model=self.model)
                    self.metrics.pop("fitness", None)
                    self.run_callbacks("on_fit_epoch_end")

    def save_model(self):
        if is_wrapped_model(self.model):
            self.save_qat_model()
        else:
            super().save_model()

def main():
    args = dict(model="yolov8n.pt", data="coco8.yaml", epochs=3, mode="train", verbose=False)
    nncf_config_dict = {
        "input_info": {"sample_size": [1, 3, 640, 640]},
        "log_dir": "yolov8_output",  # The log directory for NNCF-specific logging outputs.
        "compression": {
            "algorithm": "quantization",
            "ignored_scopes": ["{re}/Detect"],  # ignored the post-processing
            "initializer": {"range": {"num_init_samples": 300}},
        },
    }
    nncf_trainer = MyTrainer(nncf_config_dict, overrides=args)
    nncf_trainer.train()

if __name__ == "__main__":
    main()

Please do not hesitate and ask questions if anything is unclear.

SofyaLL commented 6 months ago

Thank you! It works

SofyaLL commented 6 months ago

Hello @alexsu52! Could you please help me with exporting my model after training with nncf to openvino format? Is it right that there should be int8 quantization with such nncf_config_dict used before in my code?

I've tried code like this

from ultralytics import YOLO
model = YOLO('path/to/best.pt')  # path to best.pt after training with nncf
model.export(format='openvino')

Also, I've tried some parts of yolo export from https://github.com/ultralytics/ultralytics/blob/main/ultralytics/engine/exporter.py

import openvino as ov
from openvino.tools import mo

import torch
from ultralytics.utils.torch_utils import get_latest_opset

path_to_onnx = 'test.onnx'
opset_version = get_latest_opset()
output_names = ["output0"]
torch.onnx.export(
            nncf_trainer.model,
            torch.rand(1, 3, 640, 640),
            'test.onnx',
            verbose=False,
            opset_version=opset_version,
            do_constant_folding=True, 
            input_names=["images"],
            output_names=output_names)
ov_model_onnx = mo.convert_model(path_to_onnx, model_name='test_conf18', framework="onnx")
ov_model_onnx.set_rt_info("YOLOv8", ["model_info", "model_type"])
ov_model_onnx.set_rt_info(True, ["model_info", "reverse_input_channels"])
ov_model_onnx.set_rt_info(114, ["model_info", "pad_value"])
ov_model_onnx.set_rt_info([255.0], ["model_info", "scale_values"])
ov_model_onnx.set_rt_info(0.7, ["model_info", "iou_threshold"])
ov_model_onnx.set_rt_info([v.replace(" ", "_") for v in nncf_trainer.model.names.values()], ["model_info", "labels"])
ov_model_onnx.set_rt_info("fit_to_window_letterbox", ["model_info", "resize_type"])

ov.serialize(ov_model_onnx, 'test.xml') 

It seems like there is no int8 quantization in both obtained xml files. I didn't see any precision="I8"

alexsu52 commented 6 months ago

Hi @SofyaLL,

Could you share the test.xml file you got after using the code from https://github.com/ultralytics/ultralytics/blob/main/ultralytics/engine/exporter.py?

Anyway I used the following code to add export step to OpenVINO IR before final_eval:

    def export_model_to_openvino(self, model_path):
        pt_model = self.model
        if is_wrapped_model(pt_model):
            pt_model = nncf.strip(pt_model, do_copy=True)
        nncf_dataloader = self.get_nncf_dataloader()
        example_input = next(iter(nncf_dataloader))[0]
        try:
            ov_model = ov.convert_model(pt_model, example_input=example_input)
        except:
            onnx_model = "model.onnx"
            torch.onnx.export(pt_model, example_input, onnx_model)
            ov_model = ov.convert_model(onnx_model)

        ov.save_model(ov_model, model_path, compress_to_fp16=False)

    def final_eval(self):
        """Performs final evaluation and validation for object detection YOLO model."""
        for f in self.last, self.best:
            if f.exists():
                strip_optimizer(f)  # strip optimizers
                if f is self.best:
                    LOGGER.info(f"\nValidating {f}...")
                    self.model = f
                    self.setup_model()
                    self.export_model_to_openvino("yolov8_qat.xml")
                    self.validator.args.plots = self.args.plots
                    self.metrics = self.validator(model=self.model)
                    self.metrics.pop("fitness", None)
                    self.run_callbacks("on_fit_epoch_end")

Also you can use benchmark_app form openvino package to check performance on your device. I got 1.87x speed up for INT8 OpenVINO model comparing with FP32 OpenVINO model.

SofyaLL commented 6 months ago

https://drive.google.com/file/d/1rr4rexyIi4ylppN0WFc5sUfGMRXY9YDp/view?usp=sharing

alexsu52 commented 6 months ago

I could not reproduce the shared model using your code. I got the following model: test.zip

quinnZE commented 1 month ago

Hello @alexsu52

I was able to run the above training successfully! Though the validation performance did not change throughout the epochs and only on the final evaluation. However, I am running into an error when I attempt the export, specifically with the model stripping

    nncf_trainer.export_model_to_openvino(model_path='/home/ze-flyer/openvino_test.xml')
  File "/home/ze-flyer/ZE/PycharmProjects/ultralytics/nncf_train.py", line 169, in export_model_to_openvino
    pt_model = nncf.strip(pt_model, do_copy=True)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/common/strip.py", line 38, in strip
    return strip_pt(model, do_copy)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/torch/strip.py", line 25, in strip
    return model.nncf.strip(do_copy)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/torch/nncf_network.py", line 968, in strip
    return self.compression_controller.strip(do_copy)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/api/compression.py", line 266, in strip
    return self.strip_model(self.model, do_copy)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/torch/quantization/algo.py", line 1474, in strip_model
    model = strip_quantized_model(model)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/torch/quantization/strip.py", line 174, in strip_quantized_model
    model = replace_quantizer_to_torch_native_module(model)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/torch/quantization/strip.py", line 45, in replace_quantizer_to_torch_native_module
    nncf_module = model.nncf.get_containing_module(node.node_name)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/torch/nncf_network.py", line 710, in get_containing_module
    return self.get_module_by_scope(scope)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/torch/nncf_network.py", line 695, in get_module_by_scope
    return get_module_by_scope(curr_module, scope)
  File "/home/ze-flyer/anaconda3/envs/ultralytics/lib/python3.10/site-packages/nncf/torch/dynamic_graph/scope_access.py", line 30, in get_module_by_scope
    raise nncf.InternalError(
nncf.errors.InternalError: Could not find a bn module member in NNCFBatchNorm2d module of scope DetectionModel/Sequential[model]/Conv[0]/NNCFBatchNorm2d[bn] during node search

I was wondering if you could provide insight into why this may be failing

alexsu52 commented 1 month ago

Hi @quinnZE,

Given your trace log, this looks like a bug. Could you create a new issue with bug description and provide a short reproducer and versions of NNCF, OpenVINO and Ultralitics.

quinnZE commented 1 month ago

@alexsu52 Created here