convert_graph_to_onnx.convert broken for model bart-large / wmt19-en-de

oborchers commented 3 years ago

@stas00's edit on top:

I currently don't have the know-how in this domain, so if there are members of the community with ONNX experience and this issue resonates with you, please don't hesitate to comment if you'd like to work on resolving this. Thank you very much!

Environment info

transformers version: 4.3.0.dev0
Platform: Linux-4.15.0-132-generic-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.7.1 (True)
Tensorflow version (GPU?): 2.5.0 (True)
Using GPU in script?: True
Using distributed or parallel set-up in script?: False
ONNX Version: 1.5.2 (ONNX custom build w CUDA 11)

Who can help

@stas00 (based on his suggestion to open a new issue in #9722 and run this with bart) @patrickvonplaten (based on link of @stas00 in #9722) @mfuntowicz (based on link of @stas00 in #9722) @LysandreJik (based on link of @stas00 in #9722)

Information

Model I am using (Bert, XLNet ...): facebook/bart-large & facebook/wmt19-en-de

The problem arises when using:

[X] the official example scripts: transformers.convert_graph_to_onnx.convert
[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[X] my own task or dataset: (give details below)

Description

Initially, I was about to use the export for facebook/wmt19-en-de via ONNX for our deployment. Yet, it turns out that the exported models do not work properly. It seems, that there are several things broken for the export of this model type.

To reproduce

1. Testing facebook/wmt19-en-de

import torch
import transformers
import numpy as np
import onnxruntime as rt
from pathlib import Path

from transformers import convert_graph_to_onnx

print(rt.__version__)

opt = rt.SessionOptions()

model_name = "facebook/wmt19-en-de"
pipeline_name = "translation_en_to_de"
model_pth = Path("encoder/en_de_trans.onnx")

if model_pth.exists():
    model_pth.unlink()

nlp = transformers.pipeline(pipeline_name, model=model_name, tokenizer=model_name)

convert_graph_to_onnx.convert(
    framework="pt",
    model=model_name,
    output=model_pth,
    opset=12,
    tokenizer=model_name,
    use_external_format= False,
    pipeline_name= pipeline_name,
)

sess = rt.InferenceSession(str(model_pth), opt)
spans = [
    "My name is Bert", # passes facebook/wmt19-en-de
    "My name is Bert and" # fails facebook/wmt19-en-de
]
for span in spans:
    model_input = nlp.tokenizer.encode_plus(span)
    model_input = {name : np.atleast_2d(value) for name, value in model_input.items()}
    out = nlp.model(**nlp.tokenizer(span, return_tensors="pt"))
    trans_1 = out[0].detach().cpu().numpy()
    trans_2 = out[1].detach().cpu().numpy()
    onnx_1, onnx_2 = sess.run(None, model_input)
    assert np.allclose(trans_1, onnx_1, atol=1e-5)
    assert np.allclose(trans_2, onnx_2, atol=1e-5)

Will raise the following exception:

Some weights of FSMTModel were not initialized from the model checkpoint at facebook/wmt19-en-de and are newly initialized: ['model.encoder.embed_positions.weight', 'model.decoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
ONNX opset version set to: 12
Loading pipeline (model: facebook/wmt19-en-de, tokenizer: facebook/wmt19-en-de)
Using framework PyTorch: 1.7.1
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Found output output_1 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
decoder_input_ids is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']
**[skipped warnings for brevity...]**
---------------------------------------------------------------------------
RuntimeException                          Traceback (most recent call last)
<ipython-input-2-f4eec5b0ac5f> in <module>
     51     trans_1 = out[0].detach().cpu().numpy()
     52     trans_2 = out[1].detach().cpu().numpy()
---> 53     onnx_1, onnx_2 = sess.run(None, model_input)
     54     assert np.allclose(trans_1, onnx_1, atol=1e-5)
     55     assert np.allclose(trans_2, onnx_2, atol=1e-5)

~/anaconda3/envs/dev/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py in run(self, output_names, input_feed, run_options)
    122             output_names = [output.name for output in self._outputs_meta]
    123         try:
--> 124             return self._sess.run(output_names, input_feed, run_options)
    125         except C.EPFail as err:
    126             if self._enable_fallback:

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_74' Status Message: /data/shared/packages/onnxruntime/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:43 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector<long int>&) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,6}, requested shape:{5}

As stated in #9722, I'd assume that some dynamic shape of was not inferred properly/not passed to the dynamic_shapes of torch.onnx.export. But thats just a quick guess, which I find when I build my own ONNX models. Important: The first string passes the assertions, the second one doesn't.

2. Testing facebook/bart-large (feature extraction)

@stas00 suggested to re-test the behavior with the underlying BART model. Now, say we run the same script with the following parameters:

model_name = "facebook/bart-large"
pipeline_name = "feature-extraction"
model_pth = Path("generator/bart.onnx")

Raises

ONNX opset version set to: 12
Loading pipeline (model: facebook/bart-large, tokenizer: facebook/bart-large)
Using framework PyTorch: 1.7.1
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
**[skipped output axes for brevity...]**
Found output output_13 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
decoder_input_ids is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']
/home/oborchers/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/utils.py:1111: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input output_1
  warnings.warn('No names were found for specified dynamic axes of provided input.'
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-3362f5ef6ea8> in <module>
     30 nlp = transformers.pipeline(pipeline_name, model=model_name, tokenizer=model_name)
     31 
---> 32 convert_graph_to_onnx.convert(
     33     framework="pt",
     34     model=model_name,

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/convert_graph_to_onnx.py in convert(framework, model, output, opset, tokenizer, use_external_format, pipeline_name)
    365     # Export the graph
    366     if framework == "pt":
--> 367         convert_pytorch(nlp, opset, output, use_external_format)
    368     else:
    369         convert_tensorflow(nlp, opset, output)

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/convert_graph_to_onnx.py in convert_pytorch(nlp, opset, output, use_external_format)
    277         ordered_input_names, model_args = ensure_valid_input(nlp.model, tokens, input_names)
    278 
--> 279         export(
    280             nlp.model,
    281             model_args,

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/__init__.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
    223 
    224     from torch.onnx import utils
--> 225     return utils.export(model, args, f, export_params, verbose, training,
    226                         input_names, output_names, aten, export_raw_ir,
    227                         operator_export_type, opset_version, _retain_param_name,

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/utils.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
     83         else:
     84             operator_export_type = OperatorExportTypes.ONNX
---> 85     _export(model, args, f, export_params, verbose, training, input_names, output_names,
     86             operator_export_type=operator_export_type, opset_version=opset_version,
     87             _retain_param_name=_retain_param_name, do_constant_folding=do_constant_folding,

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, opset_version, _retain_param_name, do_constant_folding, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, enable_onnx_checker, use_external_data_format, onnx_shape_inference, use_new_jit_passes)
    627             if dynamic_axes is None:
    628                 dynamic_axes = {}
--> 629             _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
    630 
    631             graph, params_dict, torch_out = \

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/utils.py in _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
   1115             for i, x in enumerate(value):
   1116                 if not isinstance(x, int):
-> 1117                     raise ValueError("The type of axis index is expected to be an integer")
   1118                 if x in value_dict:
   1119                     warnings.warn('Duplicate dynamic axis index {} was provided for input {}.'

ValueError: The type of axis index is expected to be an integer

3. Testing facebook/bart-large (text-generation)

model_name = "facebook/bart-large"
pipeline_name = "text-generation"
model_pth = Path("generator/bart.onnx")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-d6fa1456dc0e> in <module>
     28     model_pth.unlink()
     29 
---> 30 nlp = transformers.pipeline(pipeline_name, model=model_name, tokenizer=model_name)
     31 
     32 convert_graph_to_onnx.convert(

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, framework, revision, use_fast, **kwargs)
    403             )
    404 
--> 405         model = model_class.from_pretrained(model, config=config, revision=revision, **model_kwargs)
    406         if task == "translation" and model.config.task_specific_params:
    407             for key in model.config.task_specific_params:

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/models/auto/modeling_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
   1040                 pretrained_model_name_or_path, *model_args, config=config, **kwargs
   1041             )
-> 1042         raise ValueError(
   1043             "Unrecognized configuration class {} for this kind of AutoModel: {}.\n"
   1044             "Model type should be one of {}.".format(

ValueError: Unrecognized configuration class <class 'transformers.models.bart.configuration_bart.BartConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of CamembertConfig, XLMRobertaConfig, RobertaConfig, BertConfig, OpenAIGPTConfig, GPT2Config, TransfoXLConfig, XLNetConfig, XLMConfig, CTRLConfig, ReformerConfig, BertGenerationConfig, XLMProphetNetConfig, ProphetNetConfig.

4. Testing facebook/bart-large (fill-mask)

model_name = "facebook/bart-large"
pipeline_name = "fill-mask"
model_pth = Path("generator/bart.onnx")

ONNX opset version set to: 12
Loading pipeline (model: facebook/bart-large, tokenizer: facebook/bart-large)
Using framework PyTorch: 1.7.1
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
**[skipped for brevity]**
Found output output_13 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
decoder_input_ids is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-d55ec01c8b87> in <module>
     34 nlp = transformers.pipeline(pipeline_name, model=model_name, tokenizer=model_name)
     35 
---> 36 convert_graph_to_onnx.convert(
     37     framework="pt",
     38     model=model_name,

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/convert_graph_to_onnx.py in convert(framework, model, output, opset, tokenizer, use_external_format, pipeline_name)
    365     # Export the graph
    366     if framework == "pt":
--> 367         convert_pytorch(nlp, opset, output, use_external_format)
    368     else:
    369         convert_tensorflow(nlp, opset, output)

~/anaconda3/envs/dev/lib/python3.8/site-packages/transformers/convert_graph_to_onnx.py in convert_pytorch(nlp, opset, output, use_external_format)
    277         ordered_input_names, model_args = ensure_valid_input(nlp.model, tokens, input_names)
    278 
--> 279         export(
    280             nlp.model,
    281             model_args,

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/__init__.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
    223 
    224     from torch.onnx import utils
--> 225     return utils.export(model, args, f, export_params, verbose, training,
    226                         input_names, output_names, aten, export_raw_ir,
    227                         operator_export_type, opset_version, _retain_param_name,

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/utils.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
     83         else:
     84             operator_export_type = OperatorExportTypes.ONNX
---> 85     _export(model, args, f, export_params, verbose, training, input_names, output_names,
     86             operator_export_type=operator_export_type, opset_version=opset_version,
     87             _retain_param_name=_retain_param_name, do_constant_folding=do_constant_folding,

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, opset_version, _retain_param_name, do_constant_folding, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, enable_onnx_checker, use_external_data_format, onnx_shape_inference, use_new_jit_passes)
    627             if dynamic_axes is None:
    628                 dynamic_axes = {}
--> 629             _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
    630 
    631             graph, params_dict, torch_out = \

~/anaconda3/envs/dev/lib/python3.8/site-packages/torch/onnx/utils.py in _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
   1115             for i, x in enumerate(value):
   1116                 if not isinstance(x, int):
-> 1117                     raise ValueError("The type of axis index is expected to be an integer")
   1118                 if x in value_dict:
   1119                     warnings.warn('Duplicate dynamic axis index {} was provided for input {}.'

ValueError: The type of axis index is expected to be an integer

Expected behavior

1 & 2 & 4 point into the direction, that something is wrong with inferring the dynamic shapes, if I am right. 3 just popped up while I was testing the other pipelines.

In all cases, the export & usage should work properly.

stas00 commented 3 years ago

Thank you very much, @oborchers for opening a new ticket and re-testing with other models and verifying that this problem is project-wide.

I hope @mfuntowicz gets a chance to have a look at it, or tag someone else who understands this sub-domain.

mriganktiwari commented 3 years ago

Hi @mfuntowicz @stas00 , is this a known issue with GPT2 as well? Please let me know if there is a workaround.

I was considering to convert gpt2 or gpt2-medium to ONNX using the notebook provided here.

On executing the line of code below: convert(framework="pt", model="gpt2-medium", output=Path("onnx/gpt2-medium.onnx"), opset=11)

I get this error:


   1115             for i, x in enumerate(value):
   1116                 if not isinstance(x, int):
-> 1117                     raise ValueError("The type of axis index is expected to be an integer")
   1118                 if x in value_dict:
   1119                     warnings.warn('Duplicate dynamic axis index {} was provided for input {}.'

ValueError: The type of axis index is expected to be an integer```

d-lowl commented 3 years ago

I recently stumbled upon this issue myself. Specifically case 2. The same error appears for facebook/bart-large, facebook/bart-large-cnn, IlyaGusev/mbart_ru_sum_gazeta. The main issue here is that for some outputs the tokenizer/model gives not a tensor, but rather a tuple of tensors, which is then converted into a list of shape dicts.

torch.onnx._validate_dynamic_axes (line 1193 in the latest release) expects a dict (and does nothing) or a list of ints for dynamic_axes (and mocks up some axes names), however (for the reason above) it gets a list of dicts ([map int -> string])

    for key, value in dynamic_axes.items():
        if key not in valid_names:
            warnings.warn("Provided key {} for dynamic axes is not a valid input/output name".format(key))
        if isinstance(value, list):
            warnings.warn('No names were found for specified dynamic axes of provided input.'
                          'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))

            value_dict = {}
            for i, x in enumerate(value):
                if not isinstance(x, int):
                    raise ValueError("The type of axis index is expected to be an integer")
                if x in value_dict:
                    warnings.warn('Duplicate dynamic axis index {} was provided for input {}.'
                                  .format(x, key))
                else:
                    value_dict[x] = str(key) + '_dynamic_axes_' + str(i + 1)
            dynamic_axes[key] = value_dict

I will keep digging into that, but the core question here is why Bart and related models return tuple of tensors (for outputs 1 to 12; outputs 0 and 13 are fine)? Although, I'm not an expert in either transformers, pytorch or onnx, so I might be missing something.

On a slight tangent here, is there a specific reason why summarization pipeline is not in the supported pipeline types for this script?

bestpredicts commented 3 years ago

any update?

katerinafrid commented 3 years ago

any update?

VibhuJawa commented 3 years ago

any update?

LysandreJik commented 3 years ago

We're currently working on a rework of the ONNX implementation within Transformers, which is available here: https://github.com/huggingface/transformers/pull/11786

Instead of offering a script to enable conversions for all models (which was not kept up to date with recent model releases), we're opting for a case-by-case approach, while offering the tools to convert models manually in a straightforward and simple manner; by creating OnnxConfig configuration objects to specify the input and output types of each model.

Please take a look at the PR and give us your feedback.

oborchers commented 3 years ago

@LysandreJik: Thank you very much! I think this is an excellent way to go. Having converted a dozen models myself, we internally went for something similar, albeit not nearly as streamlined / sophisticated.

@attr.s(auto_attribs=True)
class TransformersONNXConfig(BaseConfig):
    """Provides the basic configuration for all models."""

    base_model: str
    trans_cfg: PretrainedConfig

    input_names: List[str]
    output_names: List[str]
    dynamic_axes: Dict
    model_args: Set[torch.tensor]
    tokenizer: PreTrainedTokenizerFast
    extra_args: Dict

and

def create_and_export_onnx_model(self):
        """Creates a new model if the current model does not exist and exports it."""
        torch.onnx.export(
            self.create_torch_model(),
            self.cfg.model_args,
            f=self.onnx_posix_pth,
            input_names=self.cfg.input_names,
            output_names=self.cfg.output_names,
            dynamic_axes=self.cfg.dynamic_axes,
            do_constant_folding=True,
            use_external_data_format=False,
            enable_onnx_checker=True,
            opset_version=12,
        )

Where the most important part is self.create_torch_model, as we regularly modify the basic torch model with custom layers down the line. Is support for such a feature planned? If not, is it considerable? As it would substantially easy conversion of custom models, such as the sbert ones.

Furthermore, would it make sense to make OnnxConfig a part of the PreTrainedModel config to enable support from the get-go?

And finally, I assume this leaves us with the export, so that for seq2seq models we need still need to re-write the .generate function? Or is it possible to add support for an ONNX model from your side (probably difficult, as it's a part of the pre-trained model already, which would require double loading the model)?

mfuntowicz commented 3 years ago

Thanks @oborchers for your comments and use-cases.

I will let @LysandreJik speak about a potential integration of the OnnxConfig within the PreTrainedModel config, my initial plan was to have 100% backward compatibility, this explain why I put this somewhere else (currently).

Regarding generate, this is something that might require some investigations but I'm seeing good opportunities to have something within the ONNX graph with the recent knobs released by Microsoft folks on the ONNXRuntime project (cc @tianleiwu for visibility on this).

Still, for this initial rework of the ONNX exporting capabilities we focused on "model only", with the ability to extend to full pipelines in the future. Generation is definitively one of the hardest task to get within the graph, but also one where I can see the biggest benefits.

oborchers commented 3 years ago

@mfuntowicz: Thank you for your feedback! Yes, I understand the point for the compatibility to the fullest. After all, it's not that difficult to get to the config if done once or twice.

Regarding the .generate function. Thanks for the link! Will look into this more! Yes, absolutely!!

AayushSameerShah commented 1 year ago

Hi @mfuntowicz @stas00 , is this a known issue with GPT2 as well? Please let me know if there is a workaround.

I was considering to convert gpt2 or gpt2-medium to ONNX using the notebook provided here.

On executing the line of code below: convert(framework="pt", model="gpt2-medium", output=Path("onnx/gpt2-medium.onnx"), opset=11)

I get this error:
   1115             for i, x in enumerate(value):
   1116                 if not isinstance(x, int):
-> 1117                     raise ValueError("The type of axis index is expected to be an integer")
   1118                 if x in value_dict:
   1119                     warnings.warn('Duplicate dynamic axis index {} was provided for input {}.'

ValueError: The type of axis index is expected to be an integer```

Hello @mriganktiwari Any update on this? I am still facing the issue with GPT-2. I have used the same code as yours. Please guide, thanks!

mahita2104 commented 1 year ago

can i work on this? please assign this to me

ArthurZucker commented 1 year ago

Hey! This is no longer something handled at the transformers level, will close it! Sorry for the inconvenience

The way to handle it now is through optimum! See this documentation page for more information: ONNX exporter

huggingface / transformers