Loss trends of Pretrain LayoutLMv3

kash203 commented 1 year ago

Describe

Model I am using (UniLM, MiniLM, LayoutLM ...): LaoyutLMv3

I pre-trained LayoutLMv3 base model, it seems convergence begins at high MLM/MIM loss value. MLM/MIM loss convergence begins at about 6～7, WPA loss convergence at 0.5. So Total loss is about 12～15. These loss values seem high relative to normal MLM losses, I think it is strange. Actually, inference with above model, word parts does not become sentence, and image parts is mostly white image with some noise.

Can you provide log of pre-training on LayoutLMv3 base model?

There is one more thing that I find strange, when learning MLM with span masking, it seems that top1 cannot be restored for sentences after the input length of the data after masking. (It was when I tried overtrained with small data. and length of attention_mask is aligned with the data before masking.)

for example:

original data        : A B C D E F
masked data          : A [MASK] F
model output (top 1) : A B C F F F

↑ when masked data length is 3, model output can only inference 3 tokens.

I would like to check whether the above behavior is correct.

Conditions:

Input length to the transformer is about 709 that contains word(512) and image(197)
- batch size per GPU: about 50
- using gradient accumulation: 10=> 50 x 4 x 10 =2,000

yash0307 commented 1 year ago

Hi, where is the code for pre-training?

kash203 commented 1 year ago

Hi @yash0307, I think it is not released, so I'm implementing it myself by referring to papers. So I want to check the loss trends, just like checking answers.

vanpersie32 commented 1 year ago

hello, could you please make your code public

hieutt196 commented 1 year ago

Hi, could you please give me some information such as: number of images, the training time, ...?

microsoft / unilm

Loss trends of Pretrain LayoutLMv3 #952

Describe

Conditions: