Open piegu opened 1 year ago
For Lilt BetterTransformer support, you can see here (from HF docs) that it doesn't use vanilla scaled dot product attention in the Lilt model (two coupled attention mechanisms) which can't be supported by BetterTransformer
encoders or attentions (@fxmarty @younesbelkada).
Thanks @IlyasMoutawwakil. Indeed, I can't use BetterTransformer()
for LiLT.
What else can I use to decrease its inference time?
Thank you.
For LayoutLMv2
, it adds relative position and space embedding to the attention scores:
this can't be done with a BetterTransformer
encoder because we have no access to the attention scores but can be modeled in a BetterTransformer
attention like in here:
Hi @piegu!
I run into similar issue trying to speed-up the Lilt model. Meanwhile, I've opened the pull request both to optimum and huggingface to add support for the lilt model onnx export, here is how you can convert lilt to onnx:
`
from transformers import AutoConfig
from transformers.onnx import export
from transformers import AutoTokenizer, AutoModel
config = AutoConfig.from_pretrained("your_lilt_model_config")
onnx_config = LiltOnnxConfig(config, task="sequence-classification")
onnx_path = Path("model.onnx")
onnx_inputs, onnx_outputs = export(tokenizer, model, onnx_config,
onnx_config.default_onnx_opset, onnx_path)
`
System Info
Who can help?
@JingyaHuang , @echarlaix, @mi
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Example with LiLT model:
Error message
Expected behavior
Hi,
I'm using Hugging Face libraries in order to run LayoutXLM and LiLT models. How can I decrease inference time through Optimum? Which code to use?
I've already tested BetterTransformer (Optimum) and ONNX but none of them accepts LayoutXLM and LiLT models.
Can you update the Optimum library so that
BetterTransformer()
and/orONNX
works on LayoutXLM and LiLT models? Thank you.