jpWang / LiLT

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
MIT License
335 stars 40 forks source link

Possibility to combine lilt-only-base with roberta-large #35

Closed JP-Leite closed 3 months ago

JP-Leite commented 1 year ago

Has this been done before to compare the results against LayoutLM(large) and Ernie(Large)?

If anyone has, please provide us with the relevant checkpoint and findings for improvements against the standard datasets.

logan-markewich commented 1 year ago

This was briefly mentioned here: https://github.com/jpWang/LiLT/issues/27

It looks like some internal sizes will mismatch. I'm not sure if that means some config values need to be adjusted or if we need a larger pre-trained lilt-only-base checkpoint, I haven't had time to investigate yet

jpWang commented 3 months ago

Hi, thanks for your attention to LiLT. In principle, LiLT needs to provide its attention matrices of 12 layers to the text stream, and they should have the same num_attention_heads. However, unfortunately, 12 for LiLT-base and 16 for Roberta-large. In this case, LiLT-large needs to be trained to cooperate with Large text models. Since new LiLT models have been deployed in commercial applications, they have not been updated.