microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.56k stars 366 forks source link

[code-to-text] Unable to convert model to ONNX #65

Closed himanshuitis closed 2 years ago

himanshuitis commented 3 years ago

Trying to convert model to ONNX using -

sample_input = (source_id, source_mask)
torch.onnx.export(model, sample_input, 'model.onnx', export_params=True,
                 verbose=True, input_names=['source_ids', 'source_mask'],
                 output_names=['output'], 
                 dynamic_axes={'input' : {0 : 'batch_size'},
                               'output' : {0 : 'batch_size'}
                              },
                  opset_version=11
                 )

model does get converted to model.onnx but loading it in ONNXRuntime throws error: _Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from model.onnx failed:Type Error: Type parameter (T) of Optype (Concat) bound to different types (tensor(int64) and tensor(float) in node (Concat335).

code used to load the model in ONNXRuntime -

import onnx
onnx_model = onnx.load("model.onnx")
onnx.checker.check_model(onnx_model, full_check=True)      # no error here

import onnxruntime
ort_session = onnxruntime.InferenceSession("model.onnx")        # error

Similar issue raised at https://github.com/microsoft/onnxruntime/issues/1764 suggests some problem with model or in conversion process. Kindly help, Thanks!

guoday commented 3 years ago

Sorry @himanshuitis. We are unfamiliar with ONNX and are not able to solve this problem.

himanshuitis commented 3 years ago

Is it possible to convert model using torchscript? Running into some errors when using torch.jit.script When trying to use torch.jit.trace(mymodel, (sample_source_id, sample_source_mask)), I am getting TracerWarning -

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.nextYs[-1][i] == self._eos:
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.nextYs[-1][0] == self._eos:
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.nextYs[-1][i] == self._eos:
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  self.finished.sort(key=lambda a: -a[0])
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if tok == self._eos:

Though we do get the converted model after trace, we always get the same prediction for different examples and the prediction if for the example that used as sample inputs during torch.jit.trace

guoday commented 3 years ago

Hi @himanshudhawale , I have asked my colleagues but they all don't know torch.jit.script. Therefore, we can't help you about this issue.

himanshuitis commented 3 years ago

But you have used flag for support of torchscript at https://github.com/microsoft/CodeXGLUE/blob/3e7bfe6dc4a88534c7803ce1bd8d1733c1d16888/Code-Text/code-to-text/code/model.py Line no. 42.

guoday commented 3 years ago

The code is directly copied from here. I guess the problem comes from beam search. I will try to use greedy search and do a test.

guoday commented 3 years ago

I try to reproduce the error you mentioned. But it seems I load the model model.onnx.zip in ONNXRuntime successfully and don't encounter any errors in my server.

himanshuitis commented 3 years ago

Can you share the code you used to save the pytorch model convert it into ONNX and how you loaded it into ONNXRuntime. Also, request you to also share the versions of relevant libraries. Want to debug which part is causing error in my case.

Also, can you verify if you getting different predictions given different input to verify the model has converted to ONNX properly. Thanks.

WarningRan commented 2 years ago

I am currently working on the ONNX, too. And there are errors when I try to convert the fine-tuned [code-to-text] model into ONNX. Here are my codes exporting to ONNX.

            ```
            symbolic_names = {0: 'batch_size', 1: 'max_seq_len'}

            torch.onnx.export(model,                     # model being run

            args=tuple(inputs.values()),                         # model input (or a tuple for multiple inputs)

            f=export_model_path,       # where to save the model (can be a file or file-like object)

            export_params=True,        # store the trained parameter weights inside the model file

            opset_version=11,          # the ONNX version to export the model to

            do_constant_folding=True,  # whether to execute constant folding for optimization 

            input_names = ['source_ids',                         # the model's input names
                            'source_mask'],

            output_names = ['output'], # the model's output names

            dynamic_axes={'source_ids': symbolic_names,        # variable length axes
                            'source_mask' : symbolic_names,
                            'output' : symbolic_names})

The exported model is not right when I visualize it in the Netron, so I thought I might set the wrong parameters here.

WarningRan commented 2 years ago

I try to reproduce the error you mentioned. But it seems I load the model model.onnx.zip in ONNXRuntime successfully and don't encounter any errors in my server.

Could you share the code you used to save the PyTorch model and convert it into ONNX and how you loaded it into ONNXRuntime? Also, request you to also share the versions of relevant libraries. Want to debug which part is causing the error in my case. Thank you!

guoday commented 2 years ago

We are not familiar with ONNX. However, I am not sure whether you have used GPU. In beam search, the model uses GPU by default. If you use CPU, it will encounter error.

guoday commented 2 years ago

https://github.com/microsoft/CodeXGLUE/blob/28c836ae3c3f8e614805ac735809c3498f167883/Code-Text/code-to-text/code/model.py#L77 and https://github.com/microsoft/CodeXGLUE/blob/28c836ae3c3f8e614805ac735809c3498f167883/Code-Text/code-to-text/code/model.py#L110 will require GPU.

WarningRan commented 2 years ago

I try to reproduce the error you mentioned. But it seems I load the model model.onnx.zip in ONNXRuntime successfully and don't encounter any errors in my server.

Thank you so much for your quick reply. So would you mind telling me how to generate the model in the ONNX format you attached? And I am confused about which type of dummy input I should choose to apply. Would you mind giving me some hints for that?

guoday commented 2 years ago

Follow here on GPU machine. I don't know what ONNX is. I just follow here to reproduce error but nothing happens.

WarningRan commented 2 years ago

Thank you for your quick reply. By the way, what shape and types of input does the code-to-text model accept? Is [source_ids, source_mask] (from run.py) good to be the input?

guoday commented 2 years ago

source_ids and source_mask. Their shapes are [batch_size, max_length] and types are torch.long

WarningRan commented 2 years ago

Thank you so much for the clarification!