microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.12k stars 2.44k forks source link

In XFUND dataset why B-QUESTION", "B-ANSWER", "B-HEADER", "I-ANSWER", "I-QUESTION", "I-HEADER #929

Open ChidanandKumarVimaan opened 1 year ago

ChidanandKumarVimaan commented 1 year ago

Describe Model I am using (UniLM, MiniLM, LayoutLM ...):

In XFUND dataset, there are only 4 classes QUESTION, ANSWER, HEADER,OTHER but in
https://github.com/microsoft/unilm/blob/42100e11bdd3ac8e9ca2e9b506af8c9231a0c6d6/layoutlmft/layoutlmft/data/datasets/xfun.py#L48

there are 7 classes.

Not able to understand 7 classes instead of 4 classes. KIndly help

ChidanandKumarVimaan commented 1 year ago

@Dod-o o you have any answer to the above question? kindly let me know

abhibisht89 commented 1 year ago

@ChidanandKumarVimaan , This is 'BIO' tagging scheme (for token classification or NER task) , So each tag has "Begin" , "Inside", "Other" , So in total 7 classes.

ChidanandKumarKS commented 1 year ago

@abhibisht89 Thanks got it