huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.54k stars 454 forks source link

No ONNX support for BERT models when `token_type_ids` is not provided #2062

Open tomaarsen opened 1 week ago

tomaarsen commented 1 week ago

System Info

optimum==1.23.1
transformers==4.43.4
onnxruntime-gpu==1.19.2
sentence-transformers==3.2.0

Windows
Python 3.11.6

Who can help?

@michaelbenayoun

Information

Tasks

Reproduction (minimal, reproducible, runnable)

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoModel, AutoTokenizer

model_id = "intfloat/multilingual-e5-small"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id)

provider = "CPUExecutionProvider"
# provider = "CUDAExecutionProvider"
onnx_model = ORTModelForFeatureExtraction.from_pretrained(model_id, export=True, provider=provider)

inputs = tokenizer("This is my test sentence", return_tensors="pt")
print(inputs.keys())
# => dict_keys(['input_ids', 'attention_mask'])

outputs = model(**inputs)
print(outputs[0].shape)
# => torch.Size([1, 7, 384])
onnx_outputs = onnx_model(**inputs)
print(onnx_outputs[0].shape)
# If CPUExecutionProvider => AttributeError: 'NoneType' object has no attribute 'numpy'
# If CUDAExecutionProvider => KeyError: 'token_type_ids'

Expected behavior

I would expect for optimum to mirror the transformers behaviour where token_type_ids is set to torch.zeros(input_ids.shape, ...) if it's not explicitly provided. See here for that implementation in transformers: https://github.com/huggingface/transformers/blob/4de1bdbf637fe6411c104c62ab385f660bfb1064/src/transformers/models/bert/modeling_bert.py#L1070-L1076

This is preventing the following:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("intfloat/multilingual-e5-small", backend="onnx")

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

See also https://github.com/UKPLab/sentence-transformers/issues/2983

echarlaix commented 6 days ago

Thanks for reporting @tomaarsen! This is something that we are doing for openvino models https://github.com/huggingface/optimum-intel/blob/f7b5b547c167cb6a9f20fa77d493ee2dde3c3034/optimum/intel/openvino/modeling.py#L395, but never added for onnx models, will take care of adding it!