Add LayoutLMProcessor - Githubissues

gau-nernst commented 10 months ago

Feature request

Add processor for LayoutLM. I'm not sure why v2 and v3 have their respective processors, but the original v1 doesn't. It should be almost identical their v2 and v3 counterparts (apply tesseract OCR + call the tokenizer appropriately), without returning the resized image (pixel_values), since LayoutLMv1 is text-only.

This would also simplify document-question-answering pipeline, since right now the pipeline repeats the above logic for LayoutLM.

Motivation

Make LayoutLM feature-parity with its v2 and v3.

Your contribution

I can submit a PR to add LayoutLMProcessor. It should be almost identical to v2 and v3, so the task should be straight-forward.

Updating document-question-answering pipeline to use the new processor would be too complex since I'm not familiar with the codebase.

ArthurZucker commented 10 months ago

cc @amyeroberts and @NielsRogge if LayoutLM is just not as good we should use newest models

gau-nernst commented 10 months ago

There are several advantages in using LayoutLMv1:

It's text-only, so it can be much more lightweight. Not depening on detectron2 is also a plus (there are no pre-built detectron2 for latest versions of PyTorch/CUDA)
From what I know, v2 and v3 don't permit commercial use, while v1 does.
impira/layoutlm-document-qa is very good. I haven't found a good fine-tuned v2 and v3 on DocVQA.

ArthurZucker commented 10 months ago

Alright then! Feel free to open a PR if you have time

NielsRogge commented 8 months ago

Thanks @gau-nernst for opening this issue, indeed we only started defining processors for v2 and v3 but we could define one for v1 as well. Your PR already looks in a great state, let me know if you need any help.

huggingface / transformers

Add LayoutLMProcessor #27826

Feature request

Motivation

Your contribution