How to decrease inference time of LayoutXLM and LiLT models through Optimum?

piegu commented 1 year ago

System Info

Last version of transformers and Optimum libraries.

Who can help?

@JingyaHuang , @echarlaix, @mi

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Example with LiLT model:

from transformers import AutoTokenizer, AutoModelForTokenClassification

model_id = "pierreguillou/lilt-xlm-roberta-base-finetuned-with-DocLayNet-base-at-paragraphlevel-ml512"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id, device_map="auto")

from optimum.bettertransformer import BetterTransformer
model = BetterTransformer.transform(model, keep_original_model=False)

Error message

NotImplementedError: The model type lilt is not yet supported to be used with BetterTransformer. Feel free to open 
an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. 
Currently supported models are: dict_keys(['albert', 'bart', 'bert', 'bert-generation', 'blenderbot', 'camembert', 
'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 
'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 
'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 
'wav2vec2', 'whisper', 'xlm-roberta', 'yolos']).

Expected behavior

Hi,

I'm using Hugging Face libraries in order to run LayoutXLM and LiLT models. How can I decrease inference time through Optimum? Which code to use?

I've already tested BetterTransformer (Optimum) and ONNX but none of them accepts LayoutXLM and LiLT models.

BetterTransformer:
- "NotImplementedError: The model type layoutlmv2 is not yet supported to be used with BetterTransformer."
- "NotImplementedError: The model type lilt is not yet supported to be used with BetterTransformer."
ONNX:
- "KeyError: 'layoutlmv2 is not supported yet.'"
- "KeyError: 'lilt is not supported yet.'"

Can you update the Optimum library so that BetterTransformer()and/or ONNXworks on LayoutXLM and LiLT models? Thank you.

IlyasMoutawwakil commented 1 year ago

For Lilt BetterTransformer support, you can see here (from HF docs) that it doesn't use vanilla scaled dot product attention in the Lilt model (two coupled attention mechanisms) which can't be supported by BetterTransformer encoders or attentions (@fxmarty @younesbelkada).

piegu commented 1 year ago

Thanks @IlyasMoutawwakil. Indeed, I can't use BetterTransformer() for LiLT. What else can I use to decrease its inference time? Thank you.

IlyasMoutawwakil commented 1 year ago

For LayoutLMv2, it adds relative position and space embedding to the attention scores:

https://github.com/huggingface/transformers/blob/a3975f94f3a090a54ed4ec78ab736ce6aaee6742/src/transformers/models/layoutlmv2/modeling_layoutlmv2.py#L176-L179

this can't be done with a BetterTransformer encoder because we have no access to the attention scores but can be modeled in a BetterTransformer attention like in here:

https://github.com/huggingface/optimum/blob/01b4898bff2e4a69773546f518f292b19d9d46ef/optimum/bettertransformer/models/attention.py#L374-L411

mariababich commented 1 year ago

Hi @piegu!

I run into similar issue trying to speed-up the Lilt model. Meanwhile, I've opened the pull request both to optimum and huggingface to add support for the lilt model onnx export, here is how you can convert lilt to onnx:

create the LiltOnnxConfig class the same as here: https://github.com/huggingface/transformers/pull/24145
run the next code

`

 from transformers import AutoConfig
 from transformers.onnx import export
 from transformers import AutoTokenizer, AutoModel
 config = AutoConfig.from_pretrained("your_lilt_model_config")
 onnx_config = LiltOnnxConfig(config, task="sequence-classification")
 onnx_path = Path("model.onnx")
 onnx_inputs, onnx_outputs = export(tokenizer, model, onnx_config,
                               onnx_config.default_onnx_opset, onnx_path)

`

huggingface / optimum