question about LiLT-base

jpWang / LiLT

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

MIT License

335 stars 40 forks source link

question about LiLT-base #21

Closed ZHEGG closed 1 year ago

ZHEGG commented 1 year ago

Thanks for this amazing code! I have some question about LiLT-based, does it mean that the text stream is not combined with any pre-trained language model, and is trained from scratch with the layout stream?

jpWang commented 1 year ago

Hi, the provided LiLT-based checkpoint is trained as described in the origin paper. It should be combined with the off-the-shelf plain text pre-trained models for fine-tuning.

ZHEGG commented 1 year ago

I went over the paper again, so LiLT-based initialize the text flow from the existing pre-trained English RoBERTa-BASE and when finetuning, it need to load RoBERTa-BASE again as the paper say "combine LiLTBASE with a new pre-trained RoBERTaBASE for finetuning" If I am right, when finetuning in English, why reload RoBERTa-BASE again, it seems to be a little redundant

jpWang commented 1 year ago

It's a nice question. As you said, the pre-trained LiLT can be directly applied in English without re-load the text-part weights, and it can get a better result. However, we reload English RoBERTa-BASE again for consistency comparison in different languages.

ZHEGG commented 1 year ago

OK,I see, thanks for your reply