microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.62k stars 2.5k forks source link

LayoutReader - Decoder #1487

Open ai-learner-00 opened 6 months ago

ai-learner-00 commented 6 months ago

Describe Model I am using: LayoutReader (Is this a good place to ask questions about papers)

image In figure 3 of the paper, I am having trouble understanding the decoder. If we can calculate the probability that x_k should be at index i (using just the encoder) when reading in order, can't we treat it like a classification problem? (mapping token to order index: x_1 -> 1, x_2 -> 3, x_3 -> 2)

What is the input and output of the decoder?