Open zhuoran-guo opened 1 year ago
If I install the torch 1.9.1 + cu111 in aimet_1.28 environment it will get the error : ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. aimettorch torch-gpu-1.28.0 requires torch==1.13.1+cu116, but you have torch 1.9.1+cu111 which is incompatible. aimettorch torch-gpu-1.28.0 requires torchvision==0.14.1+cu116, but you have torchvision 0.10.1+cu111 which is incompatible.
Can you try with opset 11 or 12?
@quic-mangal hello, thanks for your response, you means set the opset = 11 or 12 use onnx_export_args like this below ? I have already try something like this but not work yet
onnx_export_args = {
'opset_version': 11,
'verbose': True,
'input_names': [f"input{i}" for i in range(5)],
'output_names':[f"output{i}" for i in range(5)],
'dynamic_axes': None,
'keep_initializers_as_inputs': True,
}
quant_sim.export(new_dir_path, "aimet", dummy_input, onnx_export_args=onnx_export_args)
Are you able to export this model from torch to ONNX without AIMET being in the middle? Because from the error it says-
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of operator unsafe_chunk, unknown dimension size. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues [Caused by the value '628 defined in (%628 : Float(*, *, strides=[400, 1], requires_grad=1, device=cpu) = onnx::Add(%623, %627), scope: torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl::/aimet_torch.onnx_utils.CustomMarker::lstm1/torch.nn.modules.rnn.LSTMCell::marked_module # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/rnn.py:1194:0
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Add'.]
@quic-mangal
Thanks for your response, Yes I can export this model from torch to ONNX directly without AIMET
for example just use this function in the script :
it can work well:
pytorch2onnx(model, input_size=128, output_file='/work/incam-qat/model/test.onnx')
but it can not save onnx sucessful :
quant_sim.export(new_dir_path, "aimet_e2e_ptq", dummy_input=get_dummy_input_a(model, 128, False), onnx_export_args=onnx_export_args)
@quic-mangal but if I comment out this line, it can save onnx model successful, it seems that the problem is happen here: https://github.com/quic/aimet/blob/ae983476d09e863f9973a586f32fa5acd2c5217e/TrainingExtensions/torch/src/python/aimet_torch/onnx_utils.py#L1061 I want to convert the ONNX model to a DLC model and deploy it on SNPE later, Is it feasible if I ignore this line of code?
snpe-onnx-to-dlc -i aimet.onnx --quantization_overrides aimet.encodings -o aimet.dlc
snpe-dlc-quantize --input_dlc aimet.dlc --input_list ../input_list.txt --output_dlc aimet_quantization.dlc --override_params
@quic-akinlawo, could you take up this last question. Thanks
@zhuoran-guo Commenting out that line may be a non-issue depending on your model (i.e. it may work fine). In certain cases, such as a one-to-many pytorch to onnx op mapping, you could see mismatched encoding values between SNPE and AIMET if markers are not added correctly.
@quic-akinlawo @quic-mangal hi, thanks for your response, I checked the action, and if I simply comment out this line
although the ONNX model can be successfully exported, there will be a mismatch between the ONNX layer names and PyTorch. As a result, the following warning will be raised in the subsequent steps, and due to this warning, the generated encoding file will also be empty:
The following layers were not found in the exported onnx model. Encodings for these layers will not appear in the exported encodings file:
['encoder.model.conv_stem', 'encoder.model.act1', .......]
it seems due to the layers_to_onnx_op_names is empty due to I just commented out this line
Then, based on the warning message, I customized the value of the layers_to_onnx_op_names , such as like
{'encoder.model.conv_stem': ['encoder.model.conv_stem'], 'encoder.model.act1'': ['encoder.model.act1''], ......}
and the onnx model can export successful and generate encoding file with parameters .
However, I do not have an in-depth understanding of the source code, I'm not sure whether this approach is correct or if it may introduce some errors ? Additionally, regarding the export error before, can I share a minimal script to your team to reproduce the issue and then analyze how to resolve this bug? Thank you. (it seems happen due to the lstm cell)
@zhuoran-guo Please share minimal script to reproduce the issue.
@quic-hitameht thank you, you can try to run this script to reproduce the issue , it seems the issue happen because the LSTMCell part the environment is aimet_torch_gpu=1.28 with 1.13.1+cu116 if you have any problem please tell me
import os
import torch
import torch.cuda
import torch.nn as nn
from aimet_common.defs import QuantScheme
from aimet_torch.model_preparer import prepare_model
from aimet_torch.quantsim import QuantizationSimModel, OnnxExportApiArgs
from aimet_torch.batch_norm_fold import fold_all_batch_norms
import timm
class ModelA(nn.Module):
def __init__(self):
super().__init__()
self.encoder = ConvEncoder()
self.decoder = LSTMDecoder(
self.encoder.model.num_features,
)
def forward(self, x, h_prev, c_prev,):
x = self.encoder(x)
return self.decoder(x, h_prev, c_prev)
class LSTMDecoder(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.hidden_dim = 100
self.lstm = nn.LSTMCell(input_dim, 100)
def init_states(self, batch_size: int, device: torch.device):
h_next = [torch.zeros(batch_size, self.hidden_dim).to(device) for _ in range(1)]
c_next = [torch.zeros(batch_size, self.hidden_dim).to(device) for _ in range(1)]
return h_next, c_next
def forward(self, x, h, c):
h, c = self.lstm(x, (h[0], c[0]))
return h, c
class ConvEncoder(nn.Module):
def __init__(self):
super().__init__()
name = "efficientnet_lite0"
assert name in timm.list_models(pretrained=False), name
self.model = timm.create_model(name, pretrained=False, num_classes=0)
def forward(self, x) :
return self.model(x)
def get_dummy_input_a(model: nn.Module, input_size: int, use_cuda: bool):
# (B, C, H, W)
if use_cuda:
device = torch.device('cuda')
input_shape = (1, 3, input_size, input_size)
data = torch.randn(input_shape, requires_grad=False).to(device)
h, c = model.decoder.init_states(1, "cuda")
else:
device = torch.device('cpu')
input_shape = (1, 3, input_size, input_size)
data = torch.randn(input_shape, requires_grad=False).to(device)
h, c = model.decoder.init_states(1, "cpu")
return data, h, c
def main():
model = ModelA()
prepared_model = prepare_model(model)
use_cuda = False
if torch.cuda.is_available():
use_cuda = True
prepared_model.to(torch.device('cuda'))
dummy_input = get_dummy_input_a(model, 128, use_cuda)
_ = fold_all_batch_norms(prepared_model, input_shapes=None, dummy_input=get_dummy_input_a(model, 128, use_cuda))
quant_sim = QuantizationSimModel(
prepared_model,
dummy_input=dummy_input,
quant_scheme=QuantScheme.training_range_learning_with_tf_init,
)
new_dir_path = '/work/incam-qat/model_test'
os.makedirs(new_dir_path, exist_ok=True)
onnx_export_args = {
'opset_version': 11,
'verbose': True,
'input_names': [f"input{i}" for i in range(3)],
'output_names':[f"output{i}" for i in range(3)],
'dynamic_axes': None,
'keep_initializers_as_inputs': True,
}
quant_sim.export(new_dir_path, "aimet_e2e_ptq", dummy_input=get_dummy_input_a(model, 128, False), onnx_export_args=onnx_export_args)
if __name__ == "__main__":
main()
@zhuoran-guo Thanks for sharing the script. We'll take a detailed look.
Meanwhile, have you tried using torch.nn.LSTM
instead of torch.nn.LSTMCell
as suggested by this issue from ONNX repository: https://github.com/onnx/onnx/issues/3597
@quic-hitameht Yes, if I use torch.nn.LSTM instead of torch.nn.LSTMCell, I can export the ONNX model without errors. However, my model was trained with torch.nn.LSTMCell, and I need to perform Quantization-Aware Training (QAT) using Aimet. Therefore, if it is possible, I would like Aimet to support the export of models using LSTMCell.
As I discussed with @quic-mangal previously, the model can be exported successfully without Aimet. It appears that there is a bug occurring during the Aimet export process at this point: if I simply comment out this line, Aimet is able to export the ONNX model successfully. https://github.com/quic/aimet/blob/ae983476d09e863f9973a586f32fa5acd2c5217e/TrainingExtensions/torch/src/python/aimet_torch/onnx_utils.py#L1061
Hi @zhuoran-guo While the source code remains compatible with PyTorch 1.9, the release wheel package files are not. We will need to rebuild the package for PyTorch 1.9. We make these packages available for the 1.28 OS release shortly.
@quic-bharathr Thank you for the update. We appreciate your efforts in ensuring compatibility with PyTorch 1.9. We look forward to the release of the updated packages for the 1.28 OS release. Please keep us informed of any further developments or if you need any assistance from our end. It seems that I can export the model after the update in 1.28
I can export onnx model after quant_sim() in aimet_torch 1.27, but can not export the model in aimet_torch 1.28, When I export the model in 1.28
the error log seems that the default version of AIMET for PyTorch is 1.13 and only supports export with ONNX opset 14. However, here on https://github.com/quic/aimet/releases, it is mentioned that AIMET 1.28 is still compatible with PyTorch version 1.9. So, I'd like to know if it's possible to export a model that supports ONNX opset 11 in an AIMET 1.28 environment? This is because I need AIMET 1.28 for further Quantization-Aware Training (QAT) on my quantized models.
Thanks for you help !
I even can export the onnx model in aimet_torch 1.27 because the torch_version in aimet_torch 1.27 is 1.9.1+cu111 but in aimet_torch 1.28 is 1.13.1+cu116