quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.1k stars 375 forks source link

Is support NLP models ? #2495

Open wminiboy opened 11 months ago

wminiboy commented 11 months ago

I would like to confirm the latest version of aimet. is support perform QAT on NLP models, such as the bert model? If so , Can you provide an example ?

I encountered many problems when conducting QAT on the NLP model。

quic-mangal commented 11 months ago

We do support them for PyTorch Framework. An example would be- https://docs.qualcomm.com/bundle/publicresource/topics/80-64748-1/model_updates.html

wminiboy commented 11 months ago

@quic-mangal Thanks for your reply , the link example is a image process for qat . but not a "NLP model"(Natural Language Processing)model ,it is Different to process the parameters 。

quic-mangal commented 11 months ago

@wminiboy, stable diffusion model's do take in text and have a U-net based architecture. That being said, we do support quantization of NLP models. This was the closest example I could find, but you could try running a model using regular workflow, nothing special is needed from a user's side for NLP models.

wminiboy commented 11 months ago

@quic-mangal Thanks for you reply , but I encountered many problems when conducting QAT on the NLP model。

This is the dummy_input:

input_shape = (1,256) input_ids = torch.randint(low=0, high=1000, size=input_shape) attention_mask = torch.zeros(input_shape, dtype=torch.long) token_type_ids = torch.ones(input_shape, dtype=torch.long)

dummy_input = (input_ids, attention_mask, token_type_ids)

1, when I called QuantizationSimModel(...) ,the following error occurred:

Traceback (most recent call last): File "examples/training_sup_text_matching_model_qat.py", line 161, in main() File "examples/training_sup_text_matching_model_qat.py", line 142, in main calculate_quantsim_accuracy(model.bert, "ptq", evaluate, False, "output") File "examples/training_sup_text_matching_model_qat.py", line 49, in calculate_quantsim_accuracy quantsim = QuantizationSimModel(model=model, quant_scheme='tf_enhanced', File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/quantsim.py", line 180, in init self.connected_graph = ConnectedGraph(self.model, dummy_input) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/meta/connectedgraph.py", line 150, in init self._construct_graph(model, model_input) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/meta/connectedgraph.py", line 292, in _construct_graph trace = torch.jit.trace(model, model_input, **jit_trace_args) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/torch/jit/_trace.py", line 759, in trace return trace_module( File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/torch/jit/_trace.py", line 976, in trace_module module._c._create_method_from_trace( RuntimeError: Encountering a dict at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for list, use a tuple instead. for dict, use a NamedTuple instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.

To fix it , I pass strict=False to trace()

2,when I called apply_cross_layer_equalization(...) ,the following error occurred:

File "examples/training_sup_text_matching_model_qat.py", line 259, in main() File "examples/training_sup_text_matching_model_qat.py", line 187, in main apply_cross_layer_equalization(model=model.bert, input_shape=input_shape) File "examples/training_sup_text_matching_model_qat.py", line 44, in apply_cross_layer_equalization equalize_model(model, input_shape) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/cross_layer_equalization.py", line 857, in equalize_model folded_pairs = fold_all_batch_norms(model, input_shapes, dummy_input) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/batch_norm_fold.py", line 494, in fold_all_batch_norms_to_weight connected_graph = ConnectedGraph(model, inp_tensor_list) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/meta/connectedgraph.py", line 150, in init self._construct_graph(model, model_input) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/meta/connectedgraph.py", line 291, in _construct_graph module_tensor_shapes_map = ConnectedGraph._generate_module_tensor_shapes_lookup_table(model, model_input) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/meta/connectedgraph.py", line 280, in _generate_module_tensor_shapes_lookup_table run_hook_for_layers_with_given_input(model, model_input, forward_hook, leaf_node_only=False) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/utils.py", line 326, in run_hook_for_layers_with_giveninput = model(input_tensor) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl result = forward_call(input, kwargs) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1013, in forward embedding_output = self.embeddings( File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl result = forward_call(*input, *kwargs) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 230, in forward inputs_embeds = self.word_embeddings(input_ids) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl result = forward_call(input, kwargs) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 160, in forward return F.embedding( File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)

To fix it , My modifications are as follows:

def create_rand_tensors_given_shapes(input_shape: Union[Tuple, List[Tuple]], device: torch.device) \ -> List[torch.Tensor]: """ Given shapes of some tensors, create one or more random tensors and return them as a list of tensors :param input_shape: Shapes of tensors to create :param device: Device to create tensors on :return: Created list of tensors """ if isinstance(input_shape, List): input_shapes = input_shape else: input_shapes = [input_shape]

rand_tensors = []
for shape in input_shapes:
    #Modified by wminboy
    #rand_tensors.append(torch.rand(shape).to(device))
    rand_tensors.append(torch.randint(low=0, high=1000, size=shape).to(device))
    rand_tensors.append(torch.zeros(shape, dtype=torch.long).to(device))
    rand_tensors.append(torch.ones(shape, dtype=torch.long).to(device))
    #done

return rand_tensors

3, when I called apply_bias_correction(...) , the following error occurred:

File "examples/training_sup_text_matching_model_qat.py", line 259, in main() File "examples/training_sup_text_matching_model_qat.py", line 188, in main apply_bias_correction(model=model.bert, data_loader=dataloader) File "examples/training_sup_text_matching_model_qat.py", line 59, in apply_bias_correction bias_correction.correct_bias(model, params, num_quant_samples=num_quant_samples, File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/bias_correction.py", line 303, in correct_bias q.compute_encodings(pass_data_through_model, None) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/quantsim.py", line 326, in computeencodings = forward_pass_callback(self.model, forward_pass_callback_args) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/bias_correction.py", line 265, in pass_data_through_model forward_pass(model, images_in_one_batch) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/bias_correction.py", line 85, in forwardpass = model(batch) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 968, in forward input_shape = input_ids.size() AttributeError: 'list' object has no attribute 'size'

To fix it , My modifications are as follows:

def forward_pass(model: torch.nn.Module, batch: torch.Tensor): """ forward pass depending model allocation on CPU / GPU till StopForwardException :param model: model :param batch: batch :return: Nothing """

first check if the model is on GPU or not

if utils.is_model_on_gpu(model):
    batch = batch.cuda()

try:
    with utils.in_eval_mode(model), torch.no_grad():
        #Modified by wminboy
        #_ = model(batch)
        if isinstance(batch, (list, tuple)):
            _ = model(*batch)
        else:
            _ = model(batch)
        #done
except StopForwardException:
    pass

Then the following error occurred:

Traceback (most recent call last): File "examples/training_sup_text_matching_model_qat.py", line 259, in main() File "examples/training_sup_text_matching_model_qat.py", line 188, in main apply_bias_correction(model=model.bert, data_loader=dataloader) File "examples/training_sup_text_matching_model_qat.py", line 59, in apply_bias_correction bias_correction.correct_bias(model, params, num_quant_samples=num_quant_samples, File "/home/wangyj/anaconda3/envs/aimet/lib/python3.8/site-packages/aimet_torch/bias_correction.py", line 346, in correct_bias bias_correction.storePreActivationOutput(reference_output_batch) ValueError: array has incorrect number of dimensions: 5; expected 4

But I don't know how to solve this issue。

So, is it because of the input parameters that caused these errors? How should parameters be set? Your advice would be recommended greatly. Thanks!

quic-kyunggeu commented 11 months ago

Hi @wminiboy

As for the first error, it seems like you are using a dict as dummy_input. Unfortunately, PyTorch JIT tracer, which QuantizationSimModel relies on, is not good at handling dict inputs. I suggest using tuple as dummy_input instead of dict.

As for the second and third errors, first I have to admit that we have not yet tested CLE or bias correction rigorously against LLMs. Your custom solution for the second error seems good enough, but for the third error, we don't have a good insight yet. We're actively working on supporting LLMs, and we expect to resolve these issues in the future releases.

wminiboy commented 11 months ago

@quic-kyunggeu Thanks for you reply The first error, I found the reason , it is not cased by dummy_input, is becase the model output is not a tuple, so before call QuantizationSimModel(......) , I set model.config.return_dict = False . The issuse solved !

As for the third error, we look forward to your update ! Thanks !