omarsou / layoutlm_CORD

Evaluation of the Layoutlm model on the CORD dataset
MIT License
32 stars 8 forks source link

layoutlm_CORD

Introduction

This repo is a implementation of the Layoutlm Model, see [1], from the sourcecode (as I didn't manage to make it work with the huggingface implementation : HuggingFace Implementation and benchmarked on the CORD Dataset, see [2].

Results

I compare the performance of the pre-train LayoutLM on IIT-CDIP dataset (version LARGE) with the Bert (version Large).

Validation Set

Model F1_Score Precision Recall
LayoutLM Large 0.9562 0.9577 0.9546
Bert Large 0.9474 0.9466 0.9481

Test Set

Model F1_Score Precision Recall
LayoutLM Large 0.9843 0.9845 0.9841
Bert Large 0.9859 0.9861 0.9856

In the validation set, Layoutlm outperformed Bert, but it is not the case in the test set. I need to do more investigation. \ Nevertheless, it took Bert 11 minutes to finish the training (4 epochs) while Layoutlm needed only 3 minutes. (same environment, setup ..)

Important files

I am using the Layoutlm Large, files of the pre-trained model can be found on these links : \ OneDrive / GoogleDrive \ Other ressources can be found on the original repository : Official Layoutlm

TODO

I will soon put a script for the training, otherwise you can always check my notebooks. \ I will also give more details about the dataset, the notebook's structure ...

References

[1] Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou (2019) , LayoutLM: Pre-training of Text and Layout for Document Image Understanding (https://arxiv.org/abs/1912.13318), https://github.com/microsoft/unilm/tree/master/layoutlm

[2] Park, Seunghyun and Shin, Seung and Lee, Bado and Lee, Junyeop and Surh, Jaeheung and Seo, Minjoon and Lee, Hwalsuk (2019) CORD: A Consolidated Receipt Dataset for Post-OCR Parsing (Document Intelligence Workshop at Neural Information Processing Systems) https://github.com/clovaai/cord