huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.56k stars 462 forks source link

How to decrease inference time of LayoutXLM and LiLT models through Optimum? #1024

Open piegu opened 1 year ago

piegu commented 1 year ago

System Info

Last version of transformers and Optimum libraries.

Who can help?

@JingyaHuang , @echarlaix, @mi

Information

Tasks

Reproduction

Example with LiLT model:

from transformers import AutoTokenizer, AutoModelForTokenClassification

model_id = "pierreguillou/lilt-xlm-roberta-base-finetuned-with-DocLayNet-base-at-paragraphlevel-ml512"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id, device_map="auto")

from optimum.bettertransformer import BetterTransformer
model = BetterTransformer.transform(model, keep_original_model=False)

Error message

NotImplementedError: The model type lilt is not yet supported to be used with BetterTransformer. Feel free to open 
an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. 
Currently supported models are: dict_keys(['albert', 'bart', 'bert', 'bert-generation', 'blenderbot', 'camembert', 
'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 
'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 
'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 
'wav2vec2', 'whisper', 'xlm-roberta', 'yolos']).

Expected behavior

Hi,

I'm using Hugging Face libraries in order to run LayoutXLM and LiLT models. How can I decrease inference time through Optimum? Which code to use?

I've already tested BetterTransformer (Optimum) and ONNX but none of them accepts LayoutXLM and LiLT models.

Can you update the Optimum library so that BetterTransformer()and/or ONNXworks on LayoutXLM and LiLT models? Thank you.

IlyasMoutawwakil commented 1 year ago

For Lilt BetterTransformer support, you can see here (from HF docs) that it doesn't use vanilla scaled dot product attention in the Lilt model (two coupled attention mechanisms) which can't be supported by BetterTransformer encoders or attentions (@fxmarty @younesbelkada).

image

piegu commented 1 year ago

Thanks @IlyasMoutawwakil. Indeed, I can't use BetterTransformer() for LiLT. What else can I use to decrease its inference time? Thank you.

IlyasMoutawwakil commented 1 year ago

For LayoutLMv2, it adds relative position and space embedding to the attention scores:

https://github.com/huggingface/transformers/blob/a3975f94f3a090a54ed4ec78ab736ce6aaee6742/src/transformers/models/layoutlmv2/modeling_layoutlmv2.py#L176-L179

this can't be done with a BetterTransformer encoder because we have no access to the attention scores but can be modeled in a BetterTransformer attention like in here:

https://github.com/huggingface/optimum/blob/01b4898bff2e4a69773546f518f292b19d9d46ef/optimum/bettertransformer/models/attention.py#L374-L411

mariababich commented 1 year ago

Hi @piegu!

I run into similar issue trying to speed-up the Lilt model. Meanwhile, I've opened the pull request both to optimum and huggingface to add support for the lilt model onnx export, here is how you can convert lilt to onnx:

`

 from transformers import AutoConfig
 from transformers.onnx import export
 from transformers import AutoTokenizer, AutoModel
 config = AutoConfig.from_pretrained("your_lilt_model_config")
 onnx_config = LiltOnnxConfig(config, task="sequence-classification")
 onnx_path = Path("model.onnx")
 onnx_inputs, onnx_outputs = export(tokenizer, model, onnx_config,
                               onnx_config.default_onnx_opset, onnx_path)

`