Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
MIT License
335
stars
40
forks
source link
A small doubt regarding the implementation of the model #25
Hi there, thanks for your work and releasing the code.
I was trying to implement the code, from the paper. However, on page 4, section 3.1, the paper says In this BASE setting, LiLT has a 12-layer encoder with 192 hidden size, 768 feed-forward filter size, and 12 attention heads. Can you tell me, how are 192 hidden sizes used in the implementation, I mean I saw the hugging face configuration of LiLT, and the configuration was mentioned as follows:
intermediate_size (int, optional, defaults to 3072):
hidden_size (int, optional, defaults to 768), I didn't see the 192 hidden sizes anywhere there.
Hi,
in LiLT config, we set "channel_shrink_ratio" as 4 to shrink the layout flow size. That means intermediate_size 3072-->768, hidden_size 768 --> 192.
Hi there, thanks for your work and releasing the code.
I was trying to implement the code, from the paper. However, on page 4, section 3.1, the paper says
In this BASE setting, LiLT has a 12-layer encoder with 192 hidden size, 768 feed-forward filter size, and 12 attention heads
. Can you tell me, how are 192 hidden sizes used in the implementation, I mean I saw the hugging face configuration of LiLT, and the configuration was mentioned as follows:Regards, Akarsh