Closed JP-Leite closed 3 months ago
This was briefly mentioned here: https://github.com/jpWang/LiLT/issues/27
It looks like some internal sizes will mismatch. I'm not sure if that means some config values need to be adjusted or if we need a larger pre-trained lilt-only-base
checkpoint, I haven't had time to investigate yet
Hi,
thanks for your attention to LiLT.
In principle, LiLT needs to provide its attention matrices of 12 layers to the text stream, and they should have the same num_attention_heads
. However, unfortunately, 12 for LiLT-base and 16 for Roberta-large. In this case, LiLT-large needs to be trained to cooperate with Large text models. Since new LiLT models have been deployed in commercial applications, they have not been updated.
Has this been done before to compare the results against LayoutLM(large) and Ernie(Large)?
If anyone has, please provide us with the relevant checkpoint and findings for improvements against the standard datasets.