I am using LayoutLM2 and LayoutLM3 for Key-Information Extraction. Since the output annotations are normalized, it's difficult to get token-level annotations.
@pzdkn LayoutLM can be used as a general-purpose encoder for downstream tasks. You may need to design the decoder for generation or copy operations for language generation tasks.
I am using LayoutLM2 and LayoutLM3 for Key-Information Extraction. Since the output annotations are normalized, it's difficult to get token-level annotations.
I thought about rephrasing such tasks as a language generation problem instead, similar to Marksend et al, Doc2Dict: Information Extraction as Text Generation. However, is LayoutLM even capable/good at language generation ?