convert_graph_to_onnx.convert broken for translation model facebook/wmt19-en-de

oborchers commented 3 years ago

Environment info

transformers version: 4.2.2
Platform: Linux-4.15.0-132-generic-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.7.1 (True)
Tensorflow version (GPU?): 2.5.0 (True)
Using GPU in script?: True
Using distributed or parallel set-up in script?: False

Who can help

@mfuntowicz (based on initial commit of convert_graph_to_onnx) @stas00 (based on model used here) @thomwolf (based on history)

Information

Model I am using (Bert, XLNet ...): facebook/wmt19-en-de

The problem arises when using:

[X] the official example scripts: transformers.convert_graph_to_onnx.convert
[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[X] my own task or dataset: converting the translation model to onnx

To reproduce

Steps to reproduce the behavior:

import torch
import transformers
from transformers import convert_graph_to_onnx
from pathlib import Path

nlp = transformers.pipeline("translation_en_to_de", model="facebook/wmt19-en-de", tokenizer="facebook/wmt19-en-de")
convert_graph_to_onnx.convert(
    framework="pt",
    model="facebook/wmt19-en-de",
    output=Path("encoder/en_de_trans.onnx"),
    opset=12,
    tokenizer="facebook/wmt19-en-de",
    use_external_format= False,
    pipeline_name= "translation_en_to_de",
)

Raises:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-d46bec961b86> in <module>
      5 
      6 nlp = transformers.pipeline("translation_en_to_de", model="facebook/wmt19-en-de", tokenizer="facebook/wmt19-en-de")
----> 7 convert_graph_to_onnx.convert(
      8     framework="pt",
      9     model="facebook/wmt19-en-de",

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/convert_graph_to_onnx.py in convert(framework, model, output, opset, tokenizer, use_external_format, pipeline_name)
    365     # Export the graph
    366     if framework == "pt":
--> 367         convert_pytorch(nlp, opset, output, use_external_format)
    368     else:
    369         convert_tensorflow(nlp, opset, output)

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/convert_graph_to_onnx.py in convert_pytorch(nlp, opset, output, use_external_format)
    274 
    275     with torch.no_grad():
--> 276         input_names, output_names, dynamic_axes, tokens = infer_shapes(nlp, "pt")
    277         ordered_input_names, model_args = ensure_valid_input(nlp.model, tokens, input_names)
    278 

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/convert_graph_to_onnx.py in infer_shapes(nlp, framework)
    196     tokens = nlp.tokenizer("This is a sample output", return_tensors=framework)
    197     seq_len = tokens.input_ids.shape[-1]
--> 198     outputs = nlp.model(**tokens) if framework == "pt" else nlp.model(tokens)
    199     if isinstance(outputs, ModelOutput):
    200         outputs = outputs.to_tuple()

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

TypeError: forward() got an unexpected keyword argument 'token_type_ids'

Subsequently, the call of the raise can be boiled down to inferring the shapes for torch.onnx.export

I think that may be due to the incompatibility of the tokenizer() vs tokenizer.encode() for this very model.

import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/wmt19-en-de")
model = transformers.AutoModelForSeq2SeqLM.from_pretrained("facebook/wmt19-en-de")
string = "Hello. How are you?"

# model.generate(tokenizer(string, return_tensors="pt")) # Fails

model.generate(tokenizer.encode(string, return_tensors="pt")) # Succeeds

Expected behavior

Model export should work properly.

stas00 commented 3 years ago

Thank you for this excellent report, @oborchers - I'll investigate and report back.

stas00 commented 3 years ago

Fixed in https://github.com/huggingface/transformers/pull/9736

But found another problem: https://github.com/huggingface/transformers/issues/9737. Fixed in https://github.com/huggingface/transformers/pull/9738

So you will need both PRs for your task to work in case you want to try before they are merged.

oborchers commented 3 years ago

Awesome! Thank you, @stas00! Looking forward to try it out after PRs have been merged. Much appreciated

stas00 commented 3 years ago

The problem you reported has been fixed in https://github.com/huggingface/transformers/pull/9736 (merged already)

But then another one poped up in https://github.com/huggingface/transformers/issues/9737

You can just use the https://github.com/huggingface/transformers/pull/9738 branch - since it contains both fixes.

Not sure how quickly it will get merged, since we might want to solve this for other models too. I made only a local for fsmt fix in that PR.

oborchers commented 3 years ago

Great, thank you for the fast response and issue handling. I will provide a followup on #9738. While export works as intended, there is an issue I encounter while running the following code (built on 1st example):

sess = rt.InferenceSession(str(Path("encoder/en_de_trans.onnx")), opt)
spans = [
    "My name is Bert", # Succeeds
    "My name is Bert and" # Fails
]
for span in spans:
    model_input = nlp.tokenizer.encode_plus(span)
    model_input = {name : np.atleast_2d(value) for name, value in model_input.items()}
    out = nlp.model(**nlp.tokenizer(span, return_tensors="pt"))
    trans_1 = out[0].detach().cpu().numpy()
    trans_2 = out[1].detach().cpu().numpy()
    onnx_1, onnx_2 = sess.run(None, model_input)
    assert np.allclose(trans_1, onnx_1, atol=1e-5)
    assert np.allclose(trans_2, onnx_2, atol=1e-5)

"My name is Bert and" will raise:

---------------------------------------------------------------------------
RuntimeException                          Traceback (most recent call last)
<ipython-input-3-3ef2da9bdd5e> in <module>
     10     trans_1 = out[0].detach().cpu().numpy()
     11     trans_2 = out[1].detach().cpu().numpy()
---> 12     onnx_1, onnx_2 = sess.run(None, model_input)
     13     assert np.allclose(trans_1, onnx_1, atol=1e-5)
     14     assert np.allclose(trans_2, onnx_2, atol=1e-5)

~/anaconda3/envs/dev/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py in run(self, output_names, input_feed, run_options)
    122             output_names = [output.name for output in self._outputs_meta]
    123         try:
--> 124             return self._sess.run(output_names, input_feed, run_options)
    125         except C.EPFail as err:
    126             if self._enable_fallback:

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_74' Status Message: /data/shared/packages/onnxruntime/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:43 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector<long int>&) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,6}, requested shape:{5}

Solely based on intuition I'd assume that some dynamic shape of was not inferred properly/not passed to the dynamic_shapes of torch.onnx.export. But thats just a quick guess. Or did I miss something?

I see that I would have to look-into/re-implement the generate function, as only the tensors are passed back. I'm going to create a feature suggestion to support the ORT Custom Ops. Perhaps It would be possible to retrieve the actual translated string in the far future, instead of the tensors (or specify the output).

As promised follow up feature request + suggestion under #9784

stas00 commented 3 years ago

Honestly, I don't know much about the ONNX-side of things. I asked @mfuntowicz to hopefully have a look and address this.

Also tagging @LysandreJik and @patrickvonplaten who perhaps may have some answers as well.

I wonder if this is an issue project-wise, e.g. do you have the same issue if you do this on a Bart model? I'm asking since fsmt is Bart with some tweaks.

Also I think it's best to open a new issue, since now we are dealing with a different issue, so it'd be easier to track and monitor.

oborchers commented 3 years ago

Thank you for your help, @stas00! I followed your advice and created a new issue.

dmelli commented 3 years ago

@oborchers It seems that it is a problem of the pythorch export of the dynamic_axes. Using the nightly version (torch-1.9.0.dev20210212 + cpu) it works.

On the other hand, I am interested in using the onnx models to generate, (translate and summarize). Could you give me some indication of how to do a custom forward using the onnx model, to use in the generation_utils.generate function.

PS: for what you comment here 9784 you plan to work on a User-specific re-implementation. Thanks

huggingface / transformers