microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.2k stars 2.55k forks source link

Loss trends of Pretrain LayoutLMv3 #952

Open kash203 opened 1 year ago

kash203 commented 1 year ago

Describe

Model I am using (UniLM, MiniLM, LayoutLM ...): LaoyutLMv3

I pre-trained LayoutLMv3 base model, it seems convergence begins at high MLM/MIM loss value. MLM/MIM loss convergence begins at about 6~7, WPA loss convergence at 0.5. So Total loss is about 12~15. These loss values seem high relative to normal MLM losses, I think it is strange. Actually, inference with above model, word parts does not become sentence, and image parts is mostly white image with some noise.

Can you provide log of pre-training on LayoutLMv3 base model?

There is one more thing that I find strange, when learning MLM with span masking, it seems that top1 cannot be restored for sentences after the input length of the data after masking. (It was when I tried overtrained with small data. and length of attention_mask is aligned with the data before masking.)

for example:

original data        : A B C D E F
masked data          : A [MASK] F
model output (top 1) : A B C F F F

↑ when masked data length is 3, model output can only inference 3 tokens.

I would like to check whether the above behavior is correct.

Conditions:

yash0307 commented 1 year ago

Hi, where is the code for pre-training?

kash203 commented 1 year ago

Hi @yash0307, I think it is not released, so I'm implementing it myself by referring to papers. So I want to check the loss trends, just like checking answers.

vanpersie32 commented 1 year ago

hello, could you please make your code public

hieutt196 commented 1 year ago

Hi, could you please give me some information such as: number of images, the training time, ...?