jpWang / LiLT

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
MIT License
335 stars 40 forks source link

A small doubt regarding the implementation of the model #25

Closed uakarsh closed 1 year ago

uakarsh commented 1 year ago

Hi there, thanks for your work and releasing the code.

I was trying to implement the code, from the paper. However, on page 4, section 3.1, the paper says In this BASE setting, LiLT has a 12-layer encoder with 192 hidden size, 768 feed-forward filter size, and 12 attention heads. Can you tell me, how are 192 hidden sizes used in the implementation, I mean I saw the hugging face configuration of LiLT, and the configuration was mentioned as follows:

Regards, Akarsh

jpWang commented 1 year ago

Hi, in LiLT config, we set "channel_shrink_ratio" as 4 to shrink the layout flow size. That means intermediate_size 3072-->768, hidden_size 768 --> 192.

uakarsh commented 1 year ago

Got it, thanks!!