quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.15k stars 383 forks source link

How to apply AutoQuant and QuantAnalyzer with two inputs #2688

Closed ZengZhiK closed 9 months ago

ZengZhiK commented 9 months ago

I am trying to quantize one model with two input,

The model definition:

class my_model(nn.Module):
    def forward(left_input, right_input):
          ........

The dataset of dataloader definition:

class my_dataset(Dataset):
    def getitem(self, index):
        ........
        return input1, input2

The quantizaiton codes looks like:

model = my_model()
model = prepare_model(model)

........

quant_analyzer = QuantAnalyzer(model=model,
                                dummy_input=dummy_inputs,
                                forward_pass_callback=CallbackFunc(pass_calibration_data, use_cuda),
                                eval_callback=CallbackFunc(eval_callback))
quant_analyzer.enable_per_layer_mse_loss(unlabeled_dataset_iterable=dataloader_calib_unlabeled, num_batches=16)

........

auto_quant = AutoQuant(model,
                       dummy_input=dummy_input,
                       data_loader=dataloader_calib_unlabeled,
                       eval_callback=eval_callback)
sim, initial_accuracy = auto_quant.run_inference()
print(f"- Quantized Accuracy (before optimization): {initial_accuracy}")
model, optimized_accuracy, encoding_path = auto_quant.optimize(allowed_accuracy_drop=0.01)
print(f"- Quantized Accuracy (after optimization):  {optimized_accuracy}")

But the function might could not handle the model with two inputs:

image

Is it possible to solve this issue? Thank you.

quic-hitameht commented 9 months ago

Hi @ZengZhiK

Both AutoQuant and QuantAnalyzer features should support multi-input models as well. Could you please tell us what dummy_input variable looks like in your example. From, the traceback, it looks like dummy_input is of type torch.Tensor instead of tuple (left_input: torch.Tensor, right_input: torch.Tensor).

ZengZhiK commented 9 months ago

Hi, dummy_input is defined as follows

input_shape = (1, 1, 800, 1280)
dummy_input = torch.rand(input_shape)
dummy_inputs = [dummy_input, dummy_input]

The input of QuantAnalyzer and AutoQuant is different.

The version of Aimet: image

ZengZhiK commented 9 months ago

I discovered the cause of the problem:

class my_dataset_unlabeled(Dataset):
    def getitem(self, index):
        ........
        return input1, input2

class my_dataset_adaround(Dataset):
    def getitem(self, index):
        ........
        return [input1, input2], ...
 dataloader_calib_unlabeled = ...
        auto_quant = AutoQuant(model,
                               dummy_input=(dummy_input.cuda(), dummy_input.cuda()),
                               data_loader=dataloader_calib_unlabeled,
                               eval_callback=eval_callback,
                               param_bw=args.param_bw,
                               output_bw=args.output_bw,
                               config_file='aimet_config.json',
                               results_dir=args.save_dir)
dataloader_adaround = ...
adaround_params = AdaroundParameters(dataloader_adaround, num_batches=len(dataloader_adaround))
auto_quant.set_adaround_params(adaround_params)
auto_quant.set_export_params(onnx_export_args=onnxparams)
sim, initial_accuracy = auto_quant.run_inference()
print(f"=> Quantized Accuracy (before optimization): {initial_accuracy}")
model, optimized_accuracy, encoding_path = auto_quant.optimize(allowed_accuracy_drop=0.01)
print(f"=> Quantized Accuracy (after optimization):  {optimized_accuracy}")