microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.54k stars 2.49k forks source link

Is LayoutLMV3 model is capable enough to predict nested bbox? #945

Open garvsonihertzai opened 1 year ago

garvsonihertzai commented 1 year ago

Hey there, I am using the layoutLMV3 model for my project and I created the dataset with nested bounding boxes in approx all images I've tried to fine-tune the LayoutLMV3 model with my dataset as coco format (which has 17 classes, more than 10L+, and 60k+ Images) but I'm not getting good AP. So my question is, is the LayoutLMV3 model powerful enough to predict nested bounding boxes if yes, can anyone please explain how or can anyone give some example link? (with nested bounding boxes prediction) @wolfshow Thank you

HYPJUDY commented 1 year ago

Could you check if the example on document layout analysis is helpful? This task is about detecting the layouts of unstructured digital documents by providing bounding boxes and categories.