Describe
Model I am using LayoutLM : In the task ( Form Understanding,Receipt Understanding,Receipt Understanding) in the paper, How do you get the Image embeddding. Res-net50 or others ?
Especially the embedding from subtokens. different lenth subtoken may get different width image. are there some pooling method in your experiment?
@578123043 For image embeddings, we use the resnet-101 as the backbone. Different tokens get the image embeddings from the same backbone using the OCR height and width.
Describe Model I am using LayoutLM : In the task ( Form Understanding,Receipt Understanding,Receipt Understanding) in the paper, How do you get the Image embeddding. Res-net50 or others ?
Especially the embedding from subtokens. different lenth subtoken may get different width image. are there some pooling method in your experiment?