microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.07k stars 2.55k forks source link

How to get Image Embedding In LayoutLM to Re-implement Experiment in the Paper #165

Closed 578123043 closed 4 years ago

578123043 commented 4 years ago

Describe Model I am using LayoutLM : In the task ( Form Understanding,Receipt Understanding,Receipt Understanding) in the paper, How do you get the Image embeddding. Res-net50 or others ?

Especially the embedding from subtokens. different lenth subtoken may get different width image. are there some pooling method in your experiment?

wolfshow commented 4 years ago

@578123043 For image embeddings, we use the resnet-101 as the backbone. Different tokens get the image embeddings from the same backbone using the OCR height and width.