Open kash203 opened 1 year ago
Hi, where is the code for pre-training?
Hi @yash0307, I think it is not released, so I'm implementing it myself by referring to papers. So I want to check the loss trends, just like checking answers.
hello, could you please make your code public
Hi, could you please give me some information such as: number of images, the training time, ...?
Describe
Model I am using (UniLM, MiniLM, LayoutLM ...): LaoyutLMv3
I pre-trained LayoutLMv3 base model, it seems convergence begins at high MLM/MIM loss value. MLM/MIM loss convergence begins at about 6~7, WPA loss convergence at 0.5. So Total loss is about 12~15. These loss values seem high relative to normal MLM losses, I think it is strange. Actually, inference with above model, word parts does not become sentence, and image parts is mostly white image with some noise.
Can you provide log of pre-training on LayoutLMv3 base model?
There is one more thing that I find strange, when learning MLM with span masking, it seems that top1 cannot be restored for sentences after the input length of the data after masking. (It was when I tried overtrained with small data. and length of
attention_mask
is aligned with the data before masking.)for example:
↑ when masked data length is 3, model output can only inference 3 tokens.
I would like to check whether the above behavior is correct.
Conditions: