How to save the predicted output from LayoutLM or LayoutLMv2 ?

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

https://aka.ms/GeneralAI

MIT License

20.08k stars 2.55k forks source link

How to save the predicted output from LayoutLM or LayoutLMv2 ? #666

Open karndeepsingh opened 2 years ago

karndeepsingh commented 2 years ago

I trained LayoutLM for my dataset and I am getting predictions at the word level like in the image "ALVARO FRANCISCO MONTOYA" is true labeled as "party_name_1" but while prediction "ALVARO " is tagged as "party_name_1", "FRANCISCO" is tagged as "party_name_1", "MONTOYA" is tagged as "party_name_1". In short, i am getting prediction for each word but how to save these prediction as one predicted output like "ALVARO FRANCISCO MONTOYA" as "party_name_1". How to save this as a single output? Any help would be greatful. Below image is the predicted output image from LayoutLM. download (2) (2)

ysfali commented 2 years ago

@karndeepsingh did you find any way to tackle this problem? I'm also stuck in the same issue about how to join these results.

karndeepsingh commented 2 years ago

@karndeepsingh did you find any way to tackle this problem? I'm also stuck in the same issue about how to join these results.

Nope! Still figuring it out. If you find anything do let me know.

stjaco commented 2 years ago

you can use IOB-like tagging: https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)

karndeepsingh commented 2 years ago

you can use IOB-like tagging: https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)

Any reference code available to achieve it.

Thanks

stjaco commented 2 years ago

look at the commonly used datasets such as FUNSD / XFUND.

in your example, it boils down to training your model to recognize B-party_name_1 and I-party_name_1 instead of party_name_1

so that the tokens ALVARO FRANCISCO MONTOYA will be respectively tagged as B-party_name_1 I-party_name_1 I-party_name_1

(in other words, you will know that a single party_name_1 entity goes from ALVARO to MONTOYA)

ysfali commented 2 years ago

@stjaco thanks for this, I guess I'll try with BILUO based tagging. Although getting multi-line fields like addresses might still be a pain even after doing this.

karndeepsingh commented 2 years ago

@stjaco thanks for this, I guess I'll try with BILUO based tagging. Although getting multi-line fields like addresses might still be a pain even after doing this.

Did you find anyway to achieve the required output?

jyotiyadav94 commented 2 years ago

Hi @karndeepsingh ,

I am also working on something similar and I would like to ask you how did you save your predictions in a text file/csv. Where u able to resolve this problem I am using BIOES tagging. But again getting addressed is a painful part.

May be you can join the outputs belonging to same entity. or

while using the pytesseract you can preprocess this way https://stackoverflow.com/questions/69614122/tesseract-opencv-python-how-to-get-bounding-box-for-a-sentence-or-same-line-o

Can I ask you which tool did you use to annotate your data?

Maharshi2301 commented 1 year ago

@stjaco thanks for this, I guess I'll try with BILUO based tagging. Although getting multi-line fields like addresses might still be a pain even after doing this.

Did you find anyway to achieve the required output?

Any Update????