huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.24k stars 26.34k forks source link

Error while trying to run InferenceSession of onnxruntime. ValueError: Required inputs (['decoder_input_ids']) are missing from input feed (['input_ids', 'attention_mask']). #26718

Closed Burakabdi closed 11 months ago

Burakabdi commented 11 months ago

I am a begginer. I received this error: ValueError: Required inputs (['decoder_input_ids']) are missing from input feed (['input_ids', 'attention_mask']) while trying to run inference.

Model insights:

I exported my fine-tuned Pytorch model to ONNX by following this guide with following code: python -m transformers.onnx --model=mt5-base-finetuned-info-extraction onnx/

After exportation I have these files in onnx folder:

Fine-tuned Pytorch model works fine and generates the output as expected. However after exporting to onnx, when I run inference I receive the error I mentioned earlier. I try to run inference with following code:

from transformers import AutoTokenizer
from onnxruntime import InferenceSession

tokenizer = AutoTokenizer.from_pretrained("onnx")
session = InferenceSession("onnx/model.onnx", providers=['AzureExecutionProvider', 'CPUExecutionProvider'])

text = "This is an example Arabic text"

inputs = tokenizer(text, return_tensors="np")
outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))

After received the error: ValueError: Required inputs (['decoder_input_ids']) are missing from input feed (['input_ids', 'attention_mask']) I tried to add decoder_input_ids to input feed with this code:

from transformers import AutoTokenizer
from onnxruntime import InferenceSession
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("onnx")
session = InferenceSession("onnx/model.onnx", providers=['AzureExecutionProvider', 'CPUExecutionProvider'])

text = "This is an example Arabic text"
inputs = tokenizer(text, return_tensors="np")
decoder_start_token = tokenizer.pad_token_id
decoder_input_ids = np.full((1, 1), decoder_start_token, dtype=np.int64)

inputs["input_ids"] = inputs["input_ids"].astype(np.int64)
inputs["attention_mask"] = inputs["attention_mask"].astype(np.int64)

input_feed = {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"],
    "decoder_input_ids": decoder_input_ids
}
outputs = session.run(output_names=["last_hidden_state"], input_feed=input_feed)
logits = outputs[0]

predicted_token_id = np.argmax(logits)
decoded_output = tokenizer.decode(predicted_token_id, skip_special_tokens=True)
print(decoded_output)

I received an output from onnx model with this way, however output is not meaningful and not at all as expected.

So my question is: My case is related to exportation or running inference? How do I make onnx model to generate proper outputs like Pytorch model does?

Any help will be highly appreciated, thanks in advance.

LysandreJik commented 11 months ago

WDYT @fxmarty ?

fxmarty commented 11 months ago

Thank you for the details @Burakabdi! The up to date documentation is here about ONNX export is here: https://huggingface.co/docs/transformers/v4.34.0/en/serialization

A working code snippet, matching Transformers generation, could be:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_id = "tsmatz/mt5_summarize_japanese"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForSeq2SeqLM.from_pretrained(model_id, export=True)

text = "サッカーのワールドカップカタール大会、世界ランキング24位でグループEに属する日本は、23日の1次リーグ初戦において、世界11位で過去4回の優勝を誇るドイツと対戦しました。試合は前半、ドイツの一方的なペースではじまりましたが、後半、日本の森保監督は攻撃的な選手を積極的に動員して流れを変えました。結局、日本は前半に1点を奪われましたが、途中出場の堂安律選手と浅野拓磨選手が後半にゴールを決め、2対1で逆転勝ちしました。ゲームの流れをつかんだ森保采配が功を奏しました。"

inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=15)
print(tokenizer.batch_decode(outputs))

You can find out more in Optimum documentation: https://huggingface.co/docs/optimum/main/en/exporters/onnx/overview https://huggingface.co/docs/optimum/main/en/onnxruntime/overview

If you'd like, you could also recode a generate method in pure numpy and/or with ORT C++ API.

Burakabdi commented 11 months ago

Thank you so much @fxmarty and @LysandreJik . Problem solved :)