Open gau-nernst opened 10 months ago
cc @amyeroberts and @NielsRogge if LayoutLM is just not as good we should use newest models
There are several advantages in using LayoutLMv1:
Alright then! Feel free to open a PR if you have time
Thanks @gau-nernst for opening this issue, indeed we only started defining processors for v2 and v3 but we could define one for v1 as well. Your PR already looks in a great state, let me know if you need any help.
Feature request
Add processor for LayoutLM. I'm not sure why v2 and v3 have their respective processors, but the original v1 doesn't. It should be almost identical their v2 and v3 counterparts (apply tesseract OCR + call the tokenizer appropriately), without returning the resized image (
pixel_values
), since LayoutLMv1 is text-only.This would also simplify
document-question-answering
pipeline, since right now the pipeline repeats the above logic for LayoutLM.Motivation
Make LayoutLM feature-parity with its v2 and v3.
Your contribution
I can submit a PR to add LayoutLMProcessor. It should be almost identical to v2 and v3, so the task should be straight-forward.
Updating
document-question-answering
pipeline to use the new processor would be too complex since I'm not familiar with the codebase.