jpWang / LiLT

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
MIT License
335 stars 40 forks source link

Use LiLT / an alternative model with more than 512 tokens #46

Closed coding-kt closed 3 months ago

coding-kt commented 10 months ago

Hi,

LiLT processes a maximum of 512 tokens.

Is there a good option to get a comparable and commercial useable model that can process more tokens?

It is of course possible to split longer inputs into 512 token chunks. But this comes with some disadvantages / difficulties.

jpWang commented 3 months ago

Hi, LiLT uses the token length 512 during the pre-training phase. The most direct way to process long document is to split the document into chunks with length 512. Alternatively, it is possible to consider linearly resizing the position embedding and then fine-tuning it.