Closed zhangxu223 closed 1 month ago
Hello, @zhangxu223 Thanks for your interest in Intel(R) Neural Compressor.
For 1, The model has been successfully quantized, you should use q_model.save("saved_results")
to save the quantized model and config. The model size of the best_model.pt
file in the folder saved_results
is approximately one fourth of the original model.
You can refer to the document PTQ.
For 2, you can set op_name_dict
and op_type_dict
in config class to achieve the purpose.
# set op_type_dict
op_type_dict = {"Conv": {"weight": {"dtype": ["fp32"]}, "activation": {"dtype": ["fp32"]}}}
conf = PostTrainingQuantConfig(op_type_dict=op_type_dict)
# or set op_name_dict
op_name_dict = {
"layer1.0.conv1": {
"activation": {
"dtype": ["fp32"],
},
"weight": {
"dtype": ["fp32"],
},
}
}
conf = PostTrainingQuantConfig(op_name_dict=op_name_dict)
You can refer to the document specify-quantization-rules.
I hope the above information is useful to you, and I am looking forward to your reply.
Hello, @zhangxu223 Thanks for your interest in Intel(R) Neural Compressor.
For 1, The model has been successfully quantized, you should use to save the quantized model and config. The model size of the file in the folder is approximately one fourth of the original model. You can refer to the document PTQ.
q_model.save("saved_results")``best_model.pt``saved_results
For 2, you can set and in config class to achieve the purpose.
op_name_dict``op_type_dict
# set op_type_dict op_type_dict = {"Conv": {"weight": {"dtype": ["fp32"]}, "activation": {"dtype": ["fp32"]}}} conf = PostTrainingQuantConfig(op_type_dict=op_type_dict) # or set op_name_dict op_name_dict = { "layer1.0.conv1": { "activation": { "dtype": ["fp32"], }, "weight": { "dtype": ["fp32"], }, } } conf = PostTrainingQuantConfig(op_name_dict=op_name_dict)
You can refer to the document specify-quantization-rules.
I hope the above information is useful to you, and I am looking forward to your reply.
Hello, thank you very much for your previous response,it has been very helpful.
After switching to the q_model.save("saved_results") saving method as you suggested, I did notice that the model size decreased compared to the original, but it didn't achieve the expected reduction to one-fourth of the original size. The model only shrank from 141MB to 130MB.
What could be the reason for this?Is it possible that some layers were not successfully quantized? How can I check the precision of each layer after quantization to ensure that all layers have been quantized correctly?
I would appreciate any further suggestions you could provide. Thank you.
Thanks for your reply.
You can compare which layer changes with the FP32 model by printing the model. Use print(q_model)
or print(q_model._model)
to display model information. q_model.fp32_model
show fp32 model. Additionally, you can use named_modules() or state_dict() to check layer details including precision (e.g dtype=torch.qint8).
Thanks for your reply. You can compare which layer changes with the FP32 model by printing the model. Use
print(q_model)
orprint(q_model._model)
to display model information.q_model.fp32_model
show fp32 model. Additionally, you can use named_modules() or state_dict() to check layer details including precision (e.g dtype=torch.qint8).
Thank you very much for your reply!!
I added the following code to check the precision of the quantized model:
for name, param in quantized_model.named_parameters():
logger.info(f"Parameter Name: {name}, Data Type: {param.dtype}, Shape: {param.shape}")
The results are as follows:
2024-08-14 15:13:33 [INFO] Save tuning history to F:\Beam-Guided-TFDPRNN-PTQ\nc_workspace\2024-08-14_15-09-18\./history.snapshot.
2024-08-14 15:13:33 [INFO] FP32 baseline is: [Accuracy: -12.6132, Duration (seconds): 251.4703]
2024-08-14 15:13:33 [INFO] Fx trace of the entire model failed, We will conduct auto quantization
2024-08-14 15:27:42 [INFO] |*********Mixed Precision Statistics********|
2024-08-14 15:27:42 [INFO] +---------------------+-------+------+------+
2024-08-14 15:27:42 [INFO] | Op Type | Total | INT8 | FP32 |
2024-08-14 15:27:42 [INFO] +---------------------+-------+------+------+
2024-08-14 15:27:42 [INFO] | quantize_per_tensor | 10 | 10 | 0 |
2024-08-14 15:27:42 [INFO] | Conv2d | 4 | 4 | 0 |
2024-08-14 15:27:42 [INFO] | dequantize | 10 | 10 | 0 |
2024-08-14 15:27:42 [INFO] | GroupNorm | 1 | 0 | 1 |
2024-08-14 15:27:42 [INFO] | Linear | 6 | 6 | 0 |
2024-08-14 15:27:42 [INFO] | LayerNorm | 6 | 0 | 6 |
2024-08-14 15:27:42 [INFO] +---------------------+-------+------+------+
2024-08-14 15:27:42 [INFO] Pass quantize model elapsed time: 848758.94 ms
2024-08-14 15:31:45 [INFO] Average SDR: -12.738859626272486
2024-08-14 15:31:45 [INFO] Tune 1 result is: [Accuracy (int8|fp32): -12.7389|-12.6132, Duration (seconds) (int8|fp32): 243.6468|251.4703], Best tune result is: [Accuracy: -12.7389, Duration (seconds): 243.6468]
2024-08-14 15:31:45 [INFO] |***********************Tune Result Statistics**********************|
2024-08-14 15:31:45 [INFO] +--------------------+-----------+---------------+------------------+
2024-08-14 15:31:45 [INFO] | Info Type | Baseline | Tune 1 result | Best tune result |
2024-08-14 15:31:45 [INFO] +--------------------+-----------+---------------+------------------+
2024-08-14 15:31:45 [INFO] | Accuracy | -12.6132 | -12.7389 | -12.7389 |
2024-08-14 15:31:45 [INFO] | Duration (seconds) | 251.4703 | 243.6468 | 243.6468 |
2024-08-14 15:31:45 [INFO] +--------------------+-----------+---------------+------------------+
2024-08-14 15:31:45 [INFO] [Strategy] Found a model that meets the accuracy requirements.
2024-08-14 15:31:45 [INFO] Save tuning history to F:\Beam-Guided-TFDPRNN-PTQ\nc_workspace\2024-08-14_15-09-18\./history.snapshot.
2024-08-14 15:31:45 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-08-14 15:31:45 [INFO] Save deploy yaml to F:\Beam-Guided-TFDPRNN-PTQ\nc_workspace\2024-08-14_15-09-18\deploy.yaml
2024-08-14 15:31:45 [INFO] Post-training quantization completed.
2024-08-14 15:31:45 [INFO] Save config file and weights of quantized model to F:\Beam-Guided-TFDPRNN-PTQ\saved_model\quantization_model.
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.encoder.bias, Data Type: torch.float32, Shape: torch.Size([256])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.LayerNormalization.weight, Data Type: torch.float32, Shape: torch.Size([256])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.LayerNormalization.bias, Data Type: torch.float32, Shape: torch.Size([256])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.rnn_freq.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.rnn_freq.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.rnn_freq.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.rnn_freq.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.norm_freq.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.norm_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.rnn_time.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.rnn_time.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.rnn_time.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.rnn_time.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.norm_time.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.0.norm_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.rnn_freq.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.rnn_freq.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.rnn_freq.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.rnn_freq.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.norm_freq.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.norm_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.rnn_time.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.rnn_time.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.rnn_time.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.rnn_time.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.norm_time.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.1.norm_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.rnn_freq.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.rnn_freq.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.rnn_freq.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.rnn_freq.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.norm_freq.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.norm_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.rnn_time.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.rnn_time.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.rnn_time.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.rnn_time.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.norm_time.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.separator.Stack.2.norm_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: _model.freq_net1.decoder.bias, Data Type: torch.float32, Shape: torch.Size([2])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.encoder.weight, Data Type: torch.float32, Shape: torch.Size([256, 2, 7, 7])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.encoder.bias, Data Type: torch.float32, Shape: torch.Size([256])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.LayerNormalization.weight, Data Type: torch.float32, Shape: torch.Size([256])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.LayerNormalization.bias, Data Type: torch.float32, Shape: torch.Size([256])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.bottleneck.weight, Data Type: torch.float32, Shape: torch.Size([64, 256, 1, 1])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.bottleneck.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.rnn_freq.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.rnn_freq.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.rnn_freq.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.rnn_freq.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.fc_freq.weight, Data Type: torch.float32, Shape: torch.Size([64, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.fc_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.norm_freq.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.norm_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.rnn_time.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.rnn_time.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.rnn_time.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.rnn_time.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.fc_time.weight, Data Type: torch.float32, Shape: torch.Size([64, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.fc_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.norm_time.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.0.norm_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.rnn_freq.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.rnn_freq.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.rnn_freq.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.rnn_freq.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.fc_freq.weight, Data Type: torch.float32, Shape: torch.Size([64, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.fc_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.norm_freq.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.norm_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.rnn_time.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.rnn_time.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.rnn_time.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.rnn_time.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.fc_time.weight, Data Type: torch.float32, Shape: torch.Size([64, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.fc_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.norm_time.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.1.norm_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.rnn_freq.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.rnn_freq.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.rnn_freq.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.rnn_freq.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.fc_freq.weight, Data Type: torch.float32, Shape: torch.Size([64, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.fc_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.norm_freq.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.norm_freq.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.rnn_time.weight_ih_l0, Data Type: torch.float32, Shape: torch.Size([512, 64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.rnn_time.weight_hh_l0, Data Type: torch.float32, Shape: torch.Size([512, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.rnn_time.bias_ih_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.rnn_time.bias_hh_l0, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.fc_time.weight, Data Type: torch.float32, Shape: torch.Size([64, 128])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.fc_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.norm_time.weight, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.Stack.2.norm_time.bias, Data Type: torch.float32, Shape: torch.Size([64])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.output_con2d.weight, Data Type: torch.float32, Shape: torch.Size([512, 64, 1, 1])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.separator.output_con2d.bias, Data Type: torch.float32, Shape: torch.Size([512])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.decoder.weight, Data Type: torch.float32, Shape: torch.Size([2, 256, 1, 1])
2024-08-14 15:31:45 [INFO] Parameter Name: fp32_model.freq_net1.decoder.bias, Data Type: torch.float32, Shape: torch.Size([2])
2024-08-14 15:31:45 [INFO] Quantized model has been successfully saved.
I don't understand why the precision of the model I'm getting is still in FP32. Why is that?
It seems that the quantized model does not support the parameters or named_parameters attributes. If you compare the parameters of the fp32 model and the int8 model, you will find that the parameters of the int8 model are unreasonable, such as the size of the tensor. I suggest you use print(model.state_dict()) instead, and then you can see that the dtype of the tensor is torch.qint8.
It seems that the quantized model does not support the parameters or named_parameters attributes. If you compare the parameters of the fp32 model and the int8 model, you will find that the parameters of the int8 model are unreasonable, such as the size of the tensor. I suggest you use print(model.state_dict()) instead, and then you can see that the dtype of the tensor is torch.qint8.
Thank you very much for your prompt reply. I followed your suggestion and conducted an evaluation using the following code:
for key, value in q_model.state_dict().items():
if isinstance(value, torch.Tensor):
logger.info(f"Tensor: {key}, Data type: {value.dtype}")
else:
logger.info(f"Non-Tensor Parameter: {key}, Type: {type(value)}")
This is output:
2024-08-16 14:46:05 [INFO] Tensor: _model.mvdr.stft_model.enc.filterbank._filters, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.mvdr.stft_model.enc.filterbank.torch_window, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.mvdr.stft_model.dec.filterbank._filters, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.mvdr.stft_model.dec.filterbank.torch_window, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.encoder.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.encoder.module_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.encoder.module_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.encoder.module.weight, Data type: torch.qint8
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.encoder.module.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.encoder.module.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.encoder.module.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.LayerNormalization_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.LayerNormalization_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_0_fc_freq_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_0_fc_freq_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_0_fc_time_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_0_fc_time_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_1_fc_freq_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_1_fc_freq_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_1_fc_time_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_1_fc_time_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_2_fc_freq_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_2_fc_freq_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_2_fc_time_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack_2_fc_time_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.output_con2d_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.output_con2d_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.LayerNormalization.weight, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.LayerNormalization.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.bottleneck.weight, Data type: torch.qint8
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.bottleneck.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.bottleneck.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.bottleneck.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.rnn_freq.weight_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.rnn_freq.weight_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.rnn_freq.bias_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.rnn_freq.bias_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.fc_freq.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.fc_freq.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.0.fc_freq._packed_params.dtype, Type: <class 'torch.dtype'>
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.0.fc_freq._packed_params._packed_params, Type: <class 'tuple'>
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.norm_freq.weight, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.norm_freq.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.rnn_time.weight_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.rnn_time.weight_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.rnn_time.bias_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.rnn_time.bias_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.fc_time.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.fc_time.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.0.fc_time._packed_params.dtype, Type: <class 'torch.dtype'>
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.0.fc_time._packed_params._packed_params, Type: <class 'tuple'>
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.norm_time.weight, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.0.norm_time.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.rnn_freq.weight_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.rnn_freq.weight_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.rnn_freq.bias_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.rnn_freq.bias_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.fc_freq.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.fc_freq.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.1.fc_freq._packed_params.dtype, Type: <class 'torch.dtype'>
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.1.fc_freq._packed_params._packed_params, Type: <class 'tuple'>
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.norm_freq.weight, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.norm_freq.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.rnn_time.weight_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.rnn_time.weight_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.rnn_time.bias_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.rnn_time.bias_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.fc_time.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.fc_time.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.1.fc_time._packed_params.dtype, Type: <class 'torch.dtype'>
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.1.fc_time._packed_params._packed_params, Type: <class 'tuple'>
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.norm_time.weight, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.1.norm_time.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.rnn_freq.weight_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.rnn_freq.weight_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.rnn_freq.bias_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.rnn_freq.bias_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.fc_freq.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.fc_freq.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.2.fc_freq._packed_params.dtype, Type: <class 'torch.dtype'>
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.2.fc_freq._packed_params._packed_params, Type: <class 'tuple'>
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.norm_freq.weight, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.norm_freq.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.rnn_time.weight_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.rnn_time.weight_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.rnn_time.bias_ih_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.rnn_time.bias_hh_l0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.fc_time.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.fc_time.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.2.fc_time._packed_params.dtype, Type: <class 'torch.dtype'>
2024-08-16 14:46:05 [INFO] Non-Tensor Parameter: _model.freq_net1.separator.Stack.2.fc_time._packed_params._packed_params, Type: <class 'tuple'>
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.norm_time.weight, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.Stack.2.norm_time.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.output_con2d.weight, Data type: torch.qint8
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.output_con2d.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.output_con2d.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.separator.output_con2d.zero_point, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.decoder.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.decoder.module_input_scale_0, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.decoder.module_input_zero_point_0, Data type: torch.int64
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.decoder.module.weight, Data type: torch.qint8
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.decoder.module.bias, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.decoder.module.scale, Data type: torch.float32
2024-08-16 14:46:05 [INFO] Tensor: _model.freq_net1.decoder.module.zero_point, Data type: torch.int64
The output shows that only a small portion of the parameters have been quantized to qint8, while most remain in fp32 precision. Could it be related to the fact that I am using PyTorch-based quantization, which might not fully quantize the model? If so, do you have any suggestions for achieving better quantization results? I'm working on a Windows system, so any tips on how to improve the quantization process for better outcomes would be greatly appreciated.
To my knowledge, some modules will not be quantized.
setting tolerable_loss
in AccuracyCriterion will tune models with better accuracy.
To my knowledge, some modules will not be quantized.
setting
tolerable_loss
in AccuracyCriterion will tune models with better accuracy.
Thank you very much!!
I've encountered an issue where the quantized model size is twice as large as the original model, which contradicts the expected result of reducing the model size after quantization.
Original Model Size: 130.96 MB Quantized Model Size: Approximately 261.92 MB (double the original size)
This is my code:
This is my logs:
I have two questions:
When using Neural Compressor for quantization, I noticed that the size of the quantized model is larger than the original one. I expected the model size to decrease after quantization, but it actually increased. The logs indicate that the model was successfully quantized to int8. Is this behavior normal? Did I successfully quantize the model? Additionally, my FP32 model was initially trained on a GPU, but this post-training quantization (PTQ) was performed on a CPU. Could this be related to the increase in model size?
Is there a way to inspect the precision of each layer in the quantized model? I would like to verify the precision (e.g., FP32, INT8) at which each layer is operating to better understand the impact of the quantization process.