Some ideas for developing Mask Language Modeling, Mask Image Modeling and Word-Patch Alignment for LayoutLMv3

14H034160212 commented 1 year ago

Hi, for anyone who interested in the implementation of LayoutLMv3. Transfomers have updated the code for mask image modeling and the code is based on DEIT. You can inherit the code to implement the Mask Image Modeling for LayoutLMv3 and also you can also inherit the code from RoBERTa to implement the mask language modeling. For the word-patch alignment, I am still in progress. Free feel to have any discussion. Here are the links: RoBERTa mask language modeling example

DEIT mask image modeling example

More ideas for developing word patch alignment

dariuszlee commented 1 year ago

Hi, I just want to add that there is https://github.com/dandelin/ViLT/blob/master/vilt/modules/vilt_module.py here if you are looking for inspiration. 'objectives.compute_itm_wpa' is their implementation. i need to adapt this for my a closed-source project but I hope we can build something out here

suresh1505 commented 1 year ago

I am using LayoutLMv3 object detection but not able to get input_ids, bbox and attention_mask only getting imges. Can you help?

microsoft / unilm

Some ideas for developing Mask Language Modeling, Mask Image Modeling and Word-Patch Alignment for LayoutLMv3 #1076