🚀 Feature request

I'd like to implement a feature to export KoBART to ONNX Runtime
Of course, transformers officially supports exporting to ONNX Runtime [here]
However, I found the dependency conflict below:

SKT-AI/KoBART requires transformers==4.3.3
transformers>=4.9.0 supports exporting BART to ONNX

I'd like to address this conflict to export KoBART to ONNX Runtime

Motivation

I'd like to optimize the computational resources of KoBART
The toolkits or frameworks I can choose:

TensorRT (TRT)
ONNX Runtime (ORT)
OpenVINO

Highlights	Which one is better or worse?
performance	TRT ≥ ORT ORT are sometimes on par with TRT
hardware	TRT ≤ ORT TRT only supports NVIDIA GPU ORT supports NVIDIA GPU and Intel CPU I cannot find any document about AMD for ORT 🙈
compatibility	TRT << ORT TRT performs device-specific optimizations [1, 2] For example, an execution engine built for a Nvidia A100 GPU will not work on a Nvidia T4 GPU 🙃
difficulty	TRT ≥ ORT

Michaël Benesty compares two methodologies, ORT and TRT
And transformers officially supports exporting to ONNX Runtime [here]

Your contribution

SKT-AI/KoBART requires transformers==4.3.3
In transformers==4.3.3, transformers.convert_graph_to_onnx supports exporting to ONNX
--help for transformers.convert_graph_to_onnx:

$ python -m transformers.convert_graph_to_onnx --help        usage: ONNX Converter [-h]
                      [--pipeline {feature-extraction,ner,sentiment-analysis,fill-mask,question-answering,text-generation,translation_en_to_fr,translation_en_to_de,translation_en_to_ro}]
                      --model MODEL [--tokenizer TOKENIZER] [--framework {pt,tf}] [--opset OPSET] [--check-loading] [--use-external-format]
                      [--quantize]
                      output

positional arguments:
  output

optional arguments:
  -h, --help            show this help message and exit
  --pipeline {feature-extraction,ner,sentiment-analysis,fill-mask,question-answering,text-generation,translation_en_to_fr,translation_en_to_de,translation_en_to_ro}
  --model MODEL         Model's id or path (ex: bert-base-cased)
  --tokenizer TOKENIZER
                        Tokenizer's id or path (ex: bert-base-cased)
  --framework {pt,tf}   Framework for loading the model
  --opset OPSET         ONNX opset to use
  --check-loading       Check ONNX is able to load the model
  --use-external-format
                        Allow exporting model >= than 2Gb
  --quantize            Quantize the neural network to be run with int8

--model:
- Hugging Face saves model into two files:
1. config.json which saves the configuration of your model
2. pytorch_model.bin which is the PyTorch checkpoint
- We can pass the directory in which they exist
- Or it also accepts model's id
- For example, the model's id of this model is skt/kobert-base-v1
--framework
- --framework p For PyTorch
- --framework t For TensorFlow

Recall the examples of $REPO_ROOT/examples in SKT-AI/KoBART
They use pytorch-lightning
And pytorch-lightning saves the model into .ckpt and .yaml file
However, transformers.convert_graph_to_onnx does NOT support such format
We can reverse them to the format of Hugging Face:

from dmp_kobart import KoBARTClassification

paths = dict()
paths['ckpt'] = $CKPT_PATH
paths['yaml'] = $YAML_PATH
paths['huggingface'] = $OUTPUT_DIR

pytorch_lightning_model_wrapper = KoBARTClassification.load_from_checkpoint(
    checkpoint_path=paths['ckpt'],
    hparams_file=paths['yaml'],
    map_location=None,
)

pytorch_lightning_model_wrapper.model.save_pretrained(paths['huggingface'])

An example of the command to export SKT-AI/KoBART to ONNX:

$ python -m transformers.convert_graph_to_onnx --framework pt \\
--model $MODEL_DIR \\
$ONNX_PATH

Got the Error message Error while converting the model: Can't load tokenizer for $TOKENIZER_PATH_OR_DIR 😱

====== Converting model to ONNX ======
ONNX opset version set to: 11
Loading pipeline (model: $MODEL_DIR, tokenizer: $MODEL_DIR)
Error while converting the model: Can't load tokenizer for '$MODEL_DIR'. Make sure that:

- '$MODEL_DIR' is a correct model identifier listed on '<https://huggingface.co/models>'

- or '$MODEL_DIR' is the correct path to a directory containing relevant tokenizer files

There's a link to tokenizer files at kobart/utils.py#L35 in SKT-AI/KoBART:

...
tokenizer = {
    'url':
    '<https://kobert.blob.core.windows.net/models/kobart/kobart_base_tokenizer_cased_cf74400bce.zip>',
    'fname': 'kobart_base_tokenizer_cased_cf74400bce.zip',
    'chksum': 'cf74400bce'
}
...

However, it still shows the same error message:

$ python -m transformers.convert_graph_to_onnx --framework pt \\
--model $MODEL_DIR \\
--tokenizer $TOKENIZER_DIR \\
$ONNX_PATH

====== Converting model to ONNX ======
ONNX opset version set to: 11
Loading pipeline (model: $MODEL_DIR, tokenizer: $TOKENIZER_DIR)
Error while converting the model: Can't load tokenizer for '$TOKENIZER_DIR'. Make sure that:

- '$TOKENIZER_DIR' is a correct model identifier listed on '<https://huggingface.co/models>'

- or '$TOKENIZER_DIR' is the correct path to a directory containing relevant tokenizer files

transformers.convert_graph_to_onnx requires four files at src/transformers/tokenization_utils_base.py#L1707 in huggingface/transformers-v4.3.3-release:

added_tokens.json [example]
special_tokens_map.json [example]
tokenizer_config.json [example]
tokenizer.json [example]

...
additional_files_names = {
                    "added_tokens_file": ADDED_TOKENS_FILE,
                    "special_tokens_map_file": SPECIAL_TOKENS_MAP_FILE,
                    "tokenizer_config_file": TOKENIZER_CONFIG_FILE,
                    "tokenizer_file": FULL_TOKENIZER_FILE,
                }
...

kobart_base_tokenizer_cased_cf74400bce.zip includes:

model.json
- It has the keys which also exist in the example of tokenizer.json
- I.e. It seems model.json is tokenizer.json
emji_tokenizer-vocab.json
- vocab.json
- However, model.json also include vocab

So, it doesn't have:

added_tokens.json [example]
tokenizer_config.json [example]
special_tokens_map.json [example]

1 is optional, but 2 and 3 is required

Got the Error message Error while converting the model: The type of axis index is expected to be an integer 😱

====== Converting model to ONNX ======
ONNX opset version set to: 11
Loading pipeline (model: $MODEL_DIR, tokenizer: $TOKENIZER_DIR)
Using framework PyTorch: 1.7.1
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Found output output_1 with shape: {0: 'batch', 2: 'sequence'}
Found output output_1 with shape: {0: 'batch', 2: 'sequence'}
Found output output_1 with shape: {0: 'batch', 2: 'sequence'}
Found output output_1 with shape: {0: 'batch', 2: 'sequence'}
Found output output_2 with shape: {0: 'batch', 2: 'sequence'}
Found output output_2 with shape: {0: 'batch', 2: 'sequence'}
Found output output_2 with shape: {0: 'batch', 2: 'sequence'}
Found output output_2 with shape: {0: 'batch', 2: 'sequence'}
Found output output_3 with shape: {0: 'batch', 2: 'sequence'}
Found output output_3 with shape: {0: 'batch', 2: 'sequence'}
Found output output_3 with shape: {0: 'batch', 2: 'sequence'}
Found output output_3 with shape: {0: 'batch', 2: 'sequence'}
Found output output_4 with shape: {0: 'batch', 2: 'sequence'}
Found output output_4 with shape: {0: 'batch', 2: 'sequence'}
Found output output_4 with shape: {0: 'batch', 2: 'sequence'}
Found output output_4 with shape: {0: 'batch', 2: 'sequence'}
Found output output_5 with shape: {0: 'batch', 2: 'sequence'}
Found output output_5 with shape: {0: 'batch', 2: 'sequence'}
Found output output_5 with shape: {0: 'batch', 2: 'sequence'}
Found output output_5 with shape: {0: 'batch', 2: 'sequence'}
Found output output_6 with shape: {0: 'batch', 2: 'sequence'}
Found output output_6 with shape: {0: 'batch', 2: 'sequence'}
Found output output_6 with shape: {0: 'batch', 2: 'sequence'}
Found output output_6 with shape: {0: 'batch', 2: 'sequence'}
Found output output_7 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
decoder_input_ids is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']
/data/swook/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/onnx/utils.py:1111: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input output_1
  warnings.warn('No names were found for specified dynamic axes of provided input.'
Error while converting the model: The type of axis index is expected to be an integer

I've found relevant issues and pull requests:

It seems they fix this issue in latest version
I.e. There is a dependency conflict:

SKT-AI/KoBART requires transformers==4.3.3
transformers should be >=4.9.0 to export BART to ONNX

Before we start, I'd like to make sure it can export BART to ONNX
Let's try to export some BARTs to ONNX by using transformers==4.12.5:

An example of the command to export facebook/bart-base to ONNX:

$ python -m transformers.onnx \\
--model $MODEL_DIR \\
$ONNX_PATH

Got the error message like:

Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1359, in from_pretrain
ed
    state_dict = torch.load(resolved_archive_file, map_location="cpu")
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/serialization.py", line 764, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 71, in <module>
    main()
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 51, in main
    model = FeaturesManager.get_model_from_feature(args.feature, args.model)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/features.py", line 125, in get_model_from_
feature
    return FeaturesManager._TASKS_TO_AUTOMODELS[task].from_pretrained(model)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 419, in from_pretrained
    return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1364, in from_pretrained
    raise OSError(
OSError: You seem to have cloned a repository without having git-lfs installed. Please install git-lfs and run `git lfs install` followed by `git lfs pull` in the folder you cloned.

We need to install git-lfs [details]
Got the error message like:

Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 71, in <module>
    main()
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 62, in main
    onnx_inputs, onnx_outputs = export(tokenizer, model, onnx_config, args.opset, args.output)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/convert.py", line 90, in export
    raise AssertionError(f"Unsupported PyTorch version, minimum required is 1.8.0, got: {torch_version}")
AssertionError: Unsupported PyTorch version, minimum required is 1.8.0, got: 1.7.1

transformers v4.12.5 requires pytorch≥1.1.0
However, transformers.onnx requires pytorch≥1.8.0 💢
Got multiple warning and the error message like:

Using framework PyTorch: 1.10.0
Overriding 1 configuration item(s)
        - use_cache -> False
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:90: UserWarning: 'enable_onnx_checker' is deprec
ated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.ONNXCheckerError.
  warnings.warn("'enable_onnx_checker' is deprecated and ignored. It will be removed in "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:103: UserWarning: `use_external_data_format' is 
deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise 
set to False because of size limits imposed by Protocol Buffers.
  warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:215: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:221: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:252: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:879: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 71, in <module>
    main()
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 64, in main
    validate_model_outputs(onnx_config, tokenizer, model, args.output, onnx_outputs, args.atol)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/convert.py", line 142, in validate_model_outputs
    from onnxruntime import InferenceSession, SessionOptions
ModuleNotFoundError: No module named 'onnxruntime'

We need to install onnxruntime or onnxruntime-gpu [details]
Got the message like:

Using framework PyTorch: 1.10.0
Overriding 1 configuration item(s)
        - use_cache -> False
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:90: UserWarning: 'enable_onnx_checker' is deprec
ated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.ONNXCheckerError.
  warnings.warn("'enable_onnx_checker' is deprecated and ignored. It will be removed in "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:103: UserWarning: `use_external_data_format' is 
deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise 
set to False because of size limits imposed by Protocol Buffers.
  warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:215: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:221: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:252: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:879: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
Validating ONNX model...
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:350: UserWarning: Deprecation warning. This ORT build has ['CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. The next release (ORT 1.10) will require explicitly setting the providers parameter (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.For example, onnxruntime.InferenceSession(..., providers=["CUDAExecutionProvider"], ...)
  warnings.warn("Deprecation warning. This ORT build has {} enabled. ".format(available_providers) +
        -[✓] ONNX model outputs' name match reference model ({'last_hidden_state', 'encoder_last_hidden_state'}
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 8, 768) matches (2, 8, 768)
                -[✓] all values close (atol: 0.0001)
        - Validating ONNX Model output "encoder_last_hidden_state":
                -[✓] (2, 8, 768) matches (2, 8, 768)
                -[✓] all values close (atol: 0.0001)
All good, model saved at: /data/swook/models/huggingface/facebook/bart-base/onnx/model.onnx

1st warning

UserWarning: 'enable_onnx_checker' is deprecated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.ONNXCheckerError.

It seems we can safely disregard this warning

2nd warning

UserWarning: `use_external_data_format' is deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise 
set to False because of size limits imposed by Protocol Buffers.

It warns we need to set use_external_data_format to False if model is larger than 2GB
BART is smaller than 2GB
It seems we can safely disregard this warning

3rd warning

/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:215: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):

Maintainer of [huggingface/transformers]() says we can disregard these warnings [details]

Rest warnings are similar to 3rd one
I compared outputs of PyTorch and ONNX for 17 sentences
Mean Absolute Percentage Error (MAPE) of last hidden state: 0.03% :joy:

Recall that

However, I found the dependency conflict below:

SKT-AI/KoBART requires transformers==4.3.3

transformers>=4.9.0 supports exporting BART to ONNX

I compared the predictions of our classifier for 10 examples in two different versions of transformers:

transformers==4.3.3
transformers>=4.12.5

They are all the same
I.e. It seems that SKT-AI/KoBART works correctly for sequence classification in transformers>=4.12.5
Of course, we should slightly modify the example for NSMC
Refer to this commit for more details

An example of the command to export our KoBART for sequence classification to ONNX:

$ python -m transformers.onnx \
> --model=$PYTORCH_MODEL_DIR \
> --feature sequence-classification \
> $ONNX_MODEL_DIR

Got the error message below when trying to export KoBART to ONNX:

Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 71, in <module>
    main()
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 52, in main
    model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=args.feature)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/features.py", line 153, in check_supported_model_or_raise
    raise ValueError(
ValueError: bart doesn't support feature sequence-classification. Supported values are: ['default']

transformers.onnx currently does NOT support exporting BART for sequence classification
I need to consider other options:
[ ] Accelerate PyTorch Inference - onnxruntime
[ ] (optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime — PyTorch Tutorials 1.10.0+cu102 documentation
[ ] Implement it myself

Got the error message below when trying to export KoBART (--feature default) to ONNX:

Using framework PyTorch: 1.10.0
Overriding 1 configuration item(s)
        - use_cache -> False
Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/draft/kobart/export2onnx.py", line 74, in <module>
    main()
  File "/data/swook/draft/kobart/export2onnx.py", line 65, in main
    onnx_inputs, onnx_outputs = export(tokenizer, model, onnx_config, args.opset, args.output)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/convert.py", line 111, in export
    raise ValueError("Model and config inputs doesn't match")
ValueError: Model and config inputs doesn't match

SKT-AI/KoBART requires two inputs:

input_ids
attention_mask

However, its tokenizer provides three inputs:

input_ids
attention_mask
token_type_ids

I.e. Model and config inputs doesn't match
token_type_ids is used to identify two different sequences
Refer to #Token Type IDs in Glossary for more details about token_type_ids
Actually, Bart doesn’t use token_type_ids for sequence classification [details]
Recall that I succeeded in exporting facebook/bart-base to ONNX
Its tokenizer provides input_ids and attention_mask, not token_type_ids
How can we address this issue?
BartTokenizer returns appropriate inputs for BART
The error doesn't occur when trying to export gogamza/kobart-base-v2 (--feature default) to ONNX with BartTokenizer
But I got the warning below:

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. 
The class this function is called from is 'BartTokenizer'.

See config.json in gogamza/kobart-base-v2 (--feature default)

{...
tokenizer_class: "PreTrainedTokenizerFast"
...}

tokenizer_class is PreTrainedTokenizerFast, not BartTokenizer :dizzy_face:
And I got new error message below:

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. 
The class this function is called from is 'BartTokenizer'.
Using framework PyTorch: 1.10.0
Overriding 1 configuration item(s)
        - use_cache -> False
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:90: UserWarning: 'enable_onnx_checker' is deprecated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.ONNXCheckerError.
  warnings.warn("'enable_onnx_checker' is deprecated and ignored. It will be removed in "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:103: UserWarning: `use_external_data_format' is deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise set to False because of size limits imposed by Protocol Buffers.
  warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:215: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:221: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:252: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:879: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
Validating ONNX model...
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:350: UserWarning: Deprecation warning. This ORT build has ['CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. The next release (ORT 1.10) will require explicitly setting the providers parameter (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.For example, onnxruntime.InferenceSession(..., providers=["CUDAExecutionProvider"], ...)
  warnings.warn("Deprecation warning. This ORT build has {} enabled. ".format(available_providers) +
        -[✓] ONNX model outputs' name match reference model ({'encoder_last_hidden_state', 'last_hidden_state'}
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 8, 768) matches (2, 8, 768)
                -[x] values not close enough (atol: 0.0001)

The prediction will be not close enough after exporting to ONNX

swoook / KoBART

Request a feature to export KoBART for sequence classification to ONNX Runtime (ORT) #1

🚀 Feature request

Motivation

Your contribution