microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.2k stars 2.55k forks source link

Should we append another FC layer or dropout layer after the sequence outputs? #261

Open ghost opened 4 years ago

ghost commented 4 years ago

Hello~~I am curious about whether we should append another FC layers or dropout layer after the sequence outputs of bert for the layoutLM pre-training?

wolfshow commented 3 years ago

@sunshine9409 Can you please elaborate that in details?

ghost commented 3 years ago

@sunshine9409 Can you please elaborate that in details?

That is, what does the head (sub-net) look like for the pretraining?