Open allanj opened 1 year ago
There are several steps to experiment on DocVQA with the extractive method:
Each step could have room for improvement. It can be helpful to analyze and improve the upper bound step by step. For example, what is the ANLS score calculated using the answers found by your start and end positions? If we can get a perfect text from human annotations, the score should be close to 100. With good OCR results, the score could be greater than 95.
Thanks. Is it possible to provide the details about how you did it for this dataset? I think this could be important to reproduce the performance and better help the open-source community.
@allanj I am trying to reproduce the result with layoulmv2 model on your code but getting below error RuntimeError: CUDA error: device-side assert triggered
This error is occurring on train_dataloader loop
I tried my best to reproduce the results reported in the paper, which is about 78% test set ANLS. But all I get is just 74% on the test set (73% on validation set), which is still way below what is reported.
Can we know more details about how to get the reported number.
My repo: https://github.com/allanj/LayoutLMv3-DocVQA Model I'm using: LayoutLMv3-base OCR I use: Microsoft READ API, using the latest model version.
The best performance I can get for using LayoutLMv3-base is just about 73.3% on validation set.
I also refer the following issues as I can't really find a public codebase that can reproduce the DocVQA results.
Appreciate that if the authors can give more suggestions/details about the experiments.