This project is the implementation of MTL-TabNet (Multi-task Learning based Model for Image-based Table Recognition) based on the repository of TableMASTER-mmocr (Thank you very much for your excellent works).
The proposed model consists of one shared encoder, one shared decoder, and three separate decoders for three sub-tasks of the table recognition problem as shown in Fig. 1. The shared encoder encodes the input table image as a sequence of features. The sequence of features is passed to the shared decoder and then the structure decoder to predict a sequence of HTML tags that represent the structure of the table. When the structure decoder produces the HTML tag representing a new cell (‘
Build a conda environment in Anaconda for MTL-TabNet (Optional).
# Create an environment with a Python version of 3.8.
conda create -n myenv python=3.8
conda activate myenv
# Install pytorch 1.9.0 with CUDA 11.1.
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
# Install cudnn if necessary.
conda install cudnn -c conda-forge
Install mmdetection. click here for details.
# We embed mmdetection-2.11.0 source code into this project.
# You can cd and install it (recommend).
cd ./mmdetection-2.11.0
pip install -v -e .
Install mmocr. click here for details.
# install mmocr
cd {Path to MTL-TabNet}
pip install -v -e .
Install mmcv-full-1.3.4. click here for details.
pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
# install mmcv-full-1.3.4 with torch version 1.9.0 cuda_version 11.1
pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
Run data_preprocess.py to get valid train data. Remember to change the 'raw_img_root' and ‘save_root’ property of PubtabnetParser to your path.
python ./table_recognition/data_preprocess.py
It will about 8 hours to finish parsing 500777 train files. After finishing the train set parsing, change the property of 'split' folder in PubtabnetParser to 'val' and get formatted val data.
Directory structure of parsed train data is :
.
├── StructureLabelAddEmptyBbox_train
│ ├── PMC1064074_007_00.txt
│ ├── PMC1064076_003_00.txt
│ ├── PMC1064076_004_00.txt
│ └── ...
├── recognition_train_img
│ ├── 0
│ ├── PMC1064100_007_00_0.png
│ ├── PMC1064100_007_00_10.png
│ ├── ...
│ └── PMC1064100_007_00_108.png
│ ├── 1
│ ├── ...
│ └── 15
├── recognition_train_txt
│ ├── 0.txt
│ ├── 1.txt
│ ├── ...
│ └── 15.txt
├── structure_alphabet.txt
└── textline_recognition_alphabet.txt
Train multi-task learning based table recognition model with MTL-TabNet.
sh ./table_recognition/expr/table_recognition_dist_train.sh
To get final results.
python ./table_recognition/run_table_inference.py
run_table_inference.py will call table_inference.py and use multiple gpu devices to do model inference. Before running this script, you should change the value of cfg in table_inference.py .
Directory structure of table recognition results are:
# If you use 8 gpu devices to inference, you will get 8 detection results pickle files, one end2end_result pickle files and 8 structure recognition results pickle files.
.
├── structure_master_caches
│ ├── structure_master_results_0.pkl
│ ├── structure_master_results_1.pkl
│ ├── ...
│ └── structure_master_results_7.pkl
Installation.
pip install -r ./table_recognition/PubTabNet-master/src/requirements.txt
Get gtVal.json.
python ./table_recognition/get_val_gt.py
Calcutate TEDS score. Before run this script, modify pred file path and gt file path in mmocr_teds_acc_mp.py
python ./table_recognition/PubTabNet-master/src/mmocr_teds_acc_mp.py
TEDS score
Datasets | TEDS (%) | TEDS-struct. (%) |
---|---|---|
FinTabNet | - | 98.79 |
PubTabNet | 96.67 | 97.88 |
Pretrained model can be download in PubTabNet and FinTabNet. (Please use master_decoder_old20220923.py instead of master_decoder.py when using the pretrained model)
To run demo for recognizing a table image (you can change the input file and checkpoint file in demo.py):
python ./table_recognition/demo/demo.py
This project is licensed under the MIT License. See LICENSE for more details.
@article{visapp23namly,
title={An End-to-End Multi-Task Learning Model for Image-based Table Recognition},
author={Nam Tuan Ly and Atsuhiro Takasu},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP},
year={2023},
pages={626-634},
publisher={SciTePress},
doi={10.5220/0011685000003417},
}
Nam Ly (namly@nii.ac.jp, namlytuan@gmail.com)
Atsuhiro Takasu (takasu@nii.ac.jp)