A small doubt regarding the implementation of the model

uakarsh commented 1 year ago

Hi there, thanks for your work and releasing the code.

I was trying to implement the code, from the paper. However, on page 4, section 3.1, the paper says In this BASE setting, LiLT has a 12-layer encoder with 192 hidden size, 768 feed-forward filter size, and 12 attention heads. Can you tell me, how are 192 hidden sizes used in the implementation, I mean I saw the hugging face configuration of LiLT, and the configuration was mentioned as follows:

intermediate_size (int, optional, defaults to 3072):
hidden_size (int, optional, defaults to 768), I didn't see the 192 hidden sizes anywhere there.

Regards, Akarsh

jpWang commented 1 year ago

Hi, in LiLT config, we set "channel_shrink_ratio" as 4 to shrink the layout flow size. That means intermediate_size 3072-->768, hidden_size 768 --> 192.

uakarsh commented 1 year ago

Got it, thanks!!

jpWang / LiLT

A small doubt regarding the implementation of the model #25