microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.62k stars 2.5k forks source link

Discrepancy between LayoutXLM paper and LayoutXLM model #631

Open albertsokol opened 2 years ago

albertsokol commented 2 years ago

Hi all, many thanks for your great work on the LayoutXLM paper.

I understand that spatial-aware self-attention is used in this architecture, as the ablation study you performed and presented in the LayoutLMv2 paper demonstrated a good improvement in model accuracy when spatial-aware self-attention was used. However, the model that is publicly available in the Huggingface repo does not use spatial-aware self-attention: the has_relative_attention_bias and has_spatial_attention_bias flags are set to false. See here for the config.json file; subsequently, these lines in the code are not reached.

Why was spatial-aware self-attention not used in the training of the LayoutXLM base model here? Did you find that the performance was worse when using it in the multi-lingual setting? Keen to learn more about this. Thank you.

bilelomrani1 commented 2 years ago

Hi, is there any progress on this issue? As far as I'm aware, this behavior is not documented in the paper, it would be nice to understand why this choice was made in the publicly released checkpoint.

EliottZemour commented 2 years ago

Loading the model by explicitly setting has_relative_attention_bias and has_spatial_attention_bias to true leads to the following warning: Some weights were not initialized from the model checkpoint at [local path] and are newly initialized: ['layoutlmv2.encoder.rel_pos_bias.weight', 'layoutlmv2.encoder.rel_pos_y_bias.weight', 'layoutlmv2.encoder.rel_pos_x_bias.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Suggesting that the pre-training was indeed performed without such bias