microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.2k stars 2.55k forks source link

Image embedding on Layout LM #243

Open kkissmart opened 4 years ago

kkissmart commented 4 years ago

Describe Model I am using LayoutLM ... :

Did you fix all the faster-rcnn parameters when finetune on downstream tasks or their parameters also got updated?

Thanks a lot!

wolfshow commented 4 years ago

@kkissmart the parameters are updated

kkissmart commented 4 years ago

thanks for the fast reply. Did you use ROI embedding for each token? this ROI is generated by RPN (anchors) or by the gt_bbox from OCR? Thanks a lot!

wolfshow commented 4 years ago

@kkissmart The ROI is from the OCR bbox.

mineshmathew commented 4 years ago

@wolfshow Does the current version of layout lm code use image embeddings ? I dont see faster rcnn used anywhere in the model