Can LayoutLM be used for language generation ?

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

https://aka.ms/GeneralAI

MIT License

20.19k stars 2.55k forks source link

Can LayoutLM be used for language generation ? #763

Open pzdkn opened 2 years ago

pzdkn commented 2 years ago

I am using LayoutLM2 and LayoutLM3 for Key-Information Extraction. Since the output annotations are normalized, it's difficult to get token-level annotations.

I thought about rephrasing such tasks as a language generation problem instead, similar to Marksend et al, Doc2Dict: Information Extraction as Text Generation. However, is LayoutLM even capable/good at language generation ?

wolfshow commented 2 years ago

@pzdkn LayoutLM can be used as a general-purpose encoder for downstream tasks. You may need to design the decoder for generation or copy operations for language generation tasks.

CheungZeeCn commented 2 years ago

@pzdkn any update ?