microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.19k stars 2.55k forks source link

Multiple pages documents for LayoutLM #232

Open victor-ab opened 4 years ago

victor-ab commented 4 years ago

My question is regarding LayoutLM. I want to apply something like the Receipt Understanding task to multiple-pages documents, what is the treatment? Is there?

wolfshow commented 4 years ago

@victor-ab The easy way is that you may split the multi-page documents into a set of blocks, which can be fed into the LayoutLM model.

victor-ab commented 4 years ago

@wolfshow what do you mean by "set of blocks"? I did not get it.

I had the idea of "concatenating" vertically all the pages. But I guess this is not the best solution, as the text will get much more dense than with just one page.

khushbu-mulani commented 4 years ago

Hey @victor-ab, greetings!

Did you find a way out which is more close to achieving expected accuracy for multipage documents?

victor-ab commented 4 years ago

Hi @khushbu-mulani ! Not yet. Please let me know if you have any ideas.

On Mon, 5 Oct 2020, 11:26 khushbu-mulani, notifications@github.com wrote:

Hey @victor-ab https://github.com/victor-ab, greetings!

Did you find a way out which is more close to achieving expected accuracy for multipage documents?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/unilm/issues/232#issuecomment-703667644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGKYMCQWHY5HUOP26ENURSLSJHJRLANCNFSM4QQ2R7PA .

khushbu-mulani commented 4 years ago

Hi @wolfshow, Can you suggest how do we deal with multiple page documents for training and for inferencing?

For training, we can either get hocr for each of the page separately OR we can combine all the pages of document and get single hocr file? But how does this work while inferencing?

Thanks in advance!

lumalav commented 8 months ago

Hi everyone. Did anyone figure out this?