Closed farzanehnakhaee70 closed 2 years ago
cc @lewtun @michaelbenayoun
Hi @farzanehnakhaee70 and thank you for raising this issue!
FYI we recently merged a major overhaul of the ONNX export for BART in #14700 which we've tested for various topologies / tasks, e.g. this works:
# Install from source with extra ONNX dependencies
pip install 'git+https://github.com/huggingface/transformers#egg=transformers[onnx]'
# Export model with default features (i.e. just `BartModel`)
python -m transformers.onnx --model=valhalla/distilbart-mnli-12-1 onnx/
Does installing from master
solve your problem? If not, can you please provide the explicit command you are using to export the model?
Welcome and thanks a lot for your consideration! I see your major changes in your configuration which largely improves usability for other tasks. But this error will not be solved without changing the code as I mentioned.
The major issue is that although we add dynamic_axis
in conversion script, but due to the error of broadcasting, the output of this function became fixed with regard to the batch_size of the dummy input. Therefore, when running the model after conversion with batch size different from the batch size of the dummy input, this error will raise.
Thank you for the extra context about the batch size :)
However, I am not able to reproduce the problem you reported. For example, suppose we export the model using the command I used in my previous comment:
# Export model with default features (i.e. just `BartModel`)
python -m transformers.onnx --model=valhalla/distilbart-mnli-12-1 onnx/
We can then load this model into an ONNX Runtime InferenceSession
as follows:
from transformers import AutoTokenizer, AutoModel
model_ckpt = "valhalla/distilbart-mnli-12-1"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
bs = 16 # batch size
ort_session = ort.InferenceSession("onnx/model.onnx")
onnx_named_outputs = ["last_hidden_state"]
inputs = tokenizer(["Hello, my name is Lewis"] * bs, return_tensors="np")
decoder_inputs = tokenizer(["Hello"] * bs, return_tensors="np")
all_inputs = {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"],
"decoder_input_ids": decoder_inputs["input_ids"],
"decoder_attention_mask": decoder_inputs["attention_mask"],
}
onnx_outputs = ort_session.run(onnx_named_outputs, all_inputs)
This runs without error using the source install of transformers
. For comparison, we can find the batch size used in the dummy inputs during the conversion as follows:
from transformers.models.bart import BartConfig, BartOnnxConfig
config = BartConfig.from_pretrained(model_ckpt)
onnx_config = BartOnnxConfig(config)
dummy_inputs = onnx_config.generate_dummy_inputs(tokenizer, framework=TensorType.NUMPY)
# Returns (batch_size, seq_len) = (2,8)
dummy_inputs["input_ids"].shape
So you can see that the dummy inputs have a batch size of 2, while the inference example I created uses a batch size of 16.
Could you please share a minimal reproducible example with the problem you're facing (e.g. a Colab notebook)?
Thanks a lot for your complete consideration.
I convert one model for sentence classification
task and it doesn't have any decoder_input_ids
and decoder_attention_mask
as input. The only inputs are input_ids
and attentio_mask
which is shown by netron.
If these inputs are availabe for the model, then we do not have any problem because the shift_tokens_right
function will no be used any more.
Would you please tell me how I can convert my model that these two inputs are also defined as the input (the same as what you have done)?
Ah, now I am able to reproduce the problem - the missing step was to specify explicitly that we should use the sequence-classification
feature š
For example, the following fails:
import onnxruntime as ort
from transformers import AutoTokenizer, AutoModel
# Export the model with the `sequence-classification` topology
model_ckpt = "valhalla/distilbart-mnli-12-1"
onnx_path = f"onnx/bart-large-clf/"
!python -m transformers.onnx --model={model_ckpt} --feature="sequence-classification" {onnx_path}
# Run with ONNX Runtime
ort_session = ort.InferenceSession(f"{onnx_path}model.onnx")
# Note we have `logits` for sequence classification heads
onnx_named_outputs = ["logits"]
# This works because the dummy inputs have batch_size=2
inputs = tokenizer(["I loved this movie!"] * 2, return_tensors="np")
onnx_outputs = ort_session.run(onnx_named_outputs, dict(inputs))
# This fails - stack trace below
inputs = tokenizer(["I loved this movie!"] * 3, return_tensors="np")
onnx_outputs = ort_session.run(onnx_named_outputs, dict(inputs))
And great detective work in figuring out that shift_tokens_right()
was the source of the problem! I think your proposal makes sense and I was able to verify that including your change fixes the problem with the export.
What do you think @michaelbenayoun? If there are no negative consequences with changing shift_tokens_right()
, my suggestion is to ask @farzanehnakhaee70 to open a PR to fix the issue.
Great. If there is anything I can help from my side, I would be happy to do it.
Hi @farzanehnakhaee70, @lewtun, Great catch @farzanehnakhaee70 !! I would say that if you have a working solution you can definitely open a PR!
Hi @farzanehnakhaee70 before we open a PR, can you please share your environment details by running the command transformers-cli env
and copy-and-pasting its output here?
I'd like to know which version of transformers this affects, the type of OS etc
Hi @lewtun Sorry for the delay. Here it is:
- `transformers` version: 4.15.0
- Platform: Linux-4.15.0-154-generic-x86_64-with-glibc2.29
- Python version: 3.8.7
- PyTorch version (GPU?): 1.10.1+cu102 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Thanks for sharing your environment @farzanehnakhaee70!
I did a fresh install with
pip install transformers[onnxruntime]==4.15
and find I am no longer able to reproduce the error (here's a Colab notebook if you want to verify). This suggests that the error I saw (and possibly in your case too) is a symptom of a problematic environment.
Would you mind doing a fresh install or providing a Colab notebook that reproduces the error? I'd like to be certain that the error is reproducible before we make any changes to the transformers
codebase. Thank you!
Sure.
Hi, Really sorry for the late response. Today I was going to test this model. However, during the test this error occurs!
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 22, in <module>
from .features import FeaturesManager
File "/usr/lib/python3.8/site-packages/transformers/onnx/features.py", line 71, in <module>
class FeaturesManager:
File "/usr/lib/python3.8/site-packages/transformers/onnx/features.py", line 273, in FeaturesManager
def get_model_from_feature(feature: str, model: str) -> PreTrainedModel:
NameError: name 'PreTrainedModel' is not defined
Do you also face this issue?
Hi @farzanehnakhaee70 I am unfortunately not able to reproduce your error - by the looks of it, it could be a problem with your environment. Did you run a fresh install in a clean virtual env with the command I shared above?
Thanks for your reply @lewtun I install it inside a fresh container and also with the command you provided. I will test it once more and inform you about the incidence.
Hi @lewtun I test it once more with a fresh install. As you said, there isn't any problem. Thanks a lot for your consideration.
Thanks for double-checking @farzanehnakhaee70 ! Does this mean we can close this issue?
Hi @lewtun Thanks a lot. For sure.
After converting
distilbart-mnli-12-1
to ONNX, while testing the onnx model, I get this issue:After lots of investigation I understand that the problem is existed with
shift_tokens_right
function inmodeling_bart.py
code.I edit the function to this:
Problem totally solved. The issue is existed with ONNX converter where do not perform correctly while there is broadcasting.
Is it possible to edit the repository and merge these changes to yours?