How to Pretrain on our own documents

jpWang / LiLT

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

MIT License

335 stars 40 forks source link

How to Pretrain on our own documents #10

Closed vibeeshan025 closed 1 year ago

vibeeshan025 commented 2 years ago

Let's say I have 1000s of domain specific documents in English. How can I Pretrain them on top of roberta-en + Lilt Based existing checkpoint.

jpWang commented 2 years ago

Hi, Maybe you can refer to Test-Time Adaptation for Visual Document Understanding.