jpWang / LiLT

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
MIT License
335 stars 40 forks source link

Invoice Extraction #1

Closed vibeeshan025 closed 2 years ago

vibeeshan025 commented 2 years ago

First of all, Great Work! Appreciate the novel concept of targeting multiple languages as fine-tuning.

Do you think lilt-infoxlm-base is sufficient to be used as base to train extract basic information from invoices, or do you think a completely fresh pretrained model using around 1 million samples required.

How long did it take for you create pretrained model and what's the hardware used.

On fine tuning, how many annotated invoices do you think required (Is that around 5000 sufficient) how long do you think the fine tuned model needs to be trained and what's the hardware.

Thanks in advance, additionally I have access to lot of invoices, if successful I can share the final model here.

jpWang commented 2 years ago

Thanks for your attention to our work.

Q1: Are these invoices monolingual or multilingual? For resource-rich languages, for example, English, LiLT+English-Roberta often performs better than LiLT+InfoXLM. Furthermore, you don't need to train a completely fresh model from scratch. For example, if your invoices are English, you can load the pre-trained LiLT+English-Roberta weight and continue to pre-train it on the unlabeled 1 million samples for a while.

Q2: Less than a week for the experimental setup described in our paper.

Q3: You can refer to the layout diversity (task difficulty), number of samples, and the SOTA performances of the public academic datasets such as FUNSD, CORD, SROIE, EPHOIE, XFUND. Generally speaking, compared with these datasets, 5000 is already a relatively sufficient number. You can also refer to our provided fine-tuning strategies and the experimental setup described in our paper.

vibeeshan025 commented 2 years ago

Thanks a lot, Currently monolingual but may be extended to other languages, I believe the LiLT+LN-Roberta method is much suited for my specific task. Is it really required to do the pre-train again with my own data from our end to achieve any better results or the fine-tuning is alone is sufficient.

First I want to see a few results before spending time and money on doing pretraining. That's why I am asking.

jpWang commented 2 years ago

Generally, performing fine-tuning alone can achieve a satisfactory result. But when you really want to utilize the unlabeled "in-domain" samples, or you really want to further improve the performance, you can try the strategy of continuing pre-training.

Bhageshwarsingh commented 2 years ago

@vincentAGNES HI, I recently came across your project. I have a few doubts and it'd be very helpful to me if you could please make some time and help me out.