microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.48k stars 2.48k forks source link

Question about the dimension of max_2d_position_embedding #269

Open tymanman opened 3 years ago

tymanman commented 3 years ago

Describe Model I am using (UniLM, MiniLM, LayoutLM ...): I'm confused about the dimension of max_2d_position_embedding which is 1024 given in the code. I think the x,y or w,h is only one dimension, so how we can give a vector with dimension of 1024 to describe a point?

viantirreau commented 3 years ago

I think it's because you have one vector of dimension hidden_size for each possible x value, y value, w value and h value. Thus, each embedding layer has shape (max_2d_position_embeddings, hidden_size) = (1024, hidden_size) Although it's possible to represent those positions with only one scalar value for each, the model will likely generalize better if you use a (learnable) embedding layer instead :wink: