Xue, Wenyuan, Qingyong Li, and Dacheng Tao. "ReS2TIM: Reconstruct Syntactic Structures from Table Images." 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019.
Tables often represent densely packed but structured data. Understanding table semantics is vital for effective information retrieval and data mining. Unlike web tables, whose semantics are readable directly from markup language and contents, the full analysis of tables published as images requires the conversion of discrete data into structured information. This paper presents a novel framework to convert a table image into its syntactic representation through the relationships between its cells. In order to reconstruct the syntactic structures of a table, we build a cell relationship network to predict the neighbors of each cell in four directions. During the training stage, a distance-based sample weight is proposed to handle the class imbalance problem. According to the detected relationships, the table is represented by a weighted graph that is then employed to infer the basic syntactic table structure. Experimental evaluation of the proposed framework using two datasets demonstrates the effectiveness of our model for cell relationship detection and table structure inference.
Create the environment from the environment.yml file conda env create --file environment.yml
or install the software needed in your environment independently.
dependencies:
- python=3.7
- torchvision==0.6.0
- pytorch==1.5.0
- pip:
- dominate==2.5.2
- opencv-python==4.4.0.42
- pandas==1.1.1
- tqdm==4.48.2
- scipy==0.5.2
- visdom==0.1.8
cd ./datasets
tar -zxvf cmdd.tar.gz
tar -zxvf icdar13table.tar.gz
## The './datasets/' folder should look like:
cd ./datasets/cmdd
python prepare.py
cd ../icdar13table
python prepare.py
cd ..
rm cmdd.tar.gz
rm icdar13table.tar.gz
# train on the cmdd dataset
python train.py --dataroot ./datasets/cmdd --gpu_ids 2 --model res2tim --dataset_mode cell_rel --lr 0.0005 --pair_batch 10000 --niter 5 --niter_decay 95 --use_mask --name res2tim_cmdd
mkdir ./checkpoints/icdar13table cp ./checkpoints/cmdd/best_net_Res2Tim.pth ./checkpoints/icdar13table/best_net_Res2Tim.pth
python train.py --dataroot ./datasets/icdar13table --gpu_ids 2 --model res2tim --dataset_mode cell_rel --lr 0.0005 --pair_batch 10000 --niter 5 --niter_decay 95 --use_mask --name res2tim_icdar13table --continue_train --epoch prt
2. Evaluation for neighbor relationship detection and cell location inference. Use your training models, or download our pretrained models and put them under './checkpoints/res2tim_cmdd/' and './checkpoints/res2tim_icdar13table/', respectively. CMDD pretrained model: [Google Drive](https://drive.google.com/file/d/1fEE-05_EAzbbRnlF6mMbxhFH9kjgnkeZ/view?usp=sharing), [百度网盘(b7pt)](https://pan.baidu.com/s/1M32SW4fwAHz6yV9fpXsMTQ). ICDAR13Table pretrained model: [Google Drive](https://drive.google.com/file/d/1fRdt4eEvVFqXVXM5mG_MJc4wVHmB_beE/view?usp=sharing), [百度网盘(2grp)](https://pan.baidu.com/s/1ImA_WKw27RuSAo369ic8FQ).
python test.py --dataroot ./datasets/cmdd --gpu_ids 5 --model res2tim --dataset_mode cell_rel --pair_batch 10000 --use_mask --name res2tim_cmdd --epoch best
python test.py --dataroot ./datasets/icdar13table --gpu_ids 5 --model res2tim --dataset_mode cell_rel --pair_batch 10000 --use_mask --name res2tim_icdar13table --epoch best
3. Key options.
- Only 1 GPU is supported for training and evaluation because of the RoI_align. If multiple GPUs are needed for a huge datset, you can change some codes to realize distributed training.
- GPU memory requirements rise dramatically as the number of table cells increases. So, we split the pair realtions in a table into multiple batches. For each iteration during training and testing, only one table image is sent to the model and the number of pair relations is not larger than '--pair_batch'.
## Experiment Results
1. Results of neighbor relationship detection
<table>
<tr>
<td> </td>
<td colspan="2">CMDD</td>
<td colspan="2">ICDAR 2013 Dataset</td>
<tr>
<tr>
<td> </td>
<td>Precision</td>
<td>Recall</td>
<td>Precision</td>
<td>Recall</td>
<tr>
<tr>
<td>The paper reports</td>
<td>0.999</td>
<td>0.997</td>
<td>0.926</td>
<td>0.447</td>
<tr>
<tr>
<td>This implementation</td>
<td>0.999</td>
<td>0.996</td>
<td>0.866</td>
<td>0.841</td>
<tr>
</table>
2. Results of cell location inference
<table>
<tr>
<td colspan="6">CMDD</td>
<tr>
<tr>
<td> </td>
<td>cell_loc</td>
<td>row1</td>
<td>row2</td>
<td>col1</td>
<td>col2</td>
<tr>
<tr>
<td>The paper reports</td>
<td>0.999</td>
<td>0.999</td>
<td>0.999</td>
<td>0.999</td>
<td>0.999</td>
<tr>
<tr>
<td>This implementation</td>
<td>0.996</td>
<td>0.999</td>
<td>0.997</td>
<td>0.999</td>
<td>0.999</td>
<tr>
</table>
<table>
<tr>
<td colspan="6">ICDAR 2013 Dataset</td>
<tr>
<tr>
<td> </td>
<td>cell_loc</td>
<td>row1</td>
<td>row2</td>
<td>col1</td>
<td>col2</td>
<tr>
<tr>
<td>The paper reports</td>
<td>0.015</td>
<td>0.053</td>
<td>0.064</td>
<td>0.166</td>
<td>0.163</td>
<tr>
<tr>
<td>This implementation</td>
<td>0.174</td>
<td>0.306</td>
<td>0.264</td>
<td>0.576</td>
<td>0.492</td>
<tr>
</table>
## Custom dataset Preparation
Refer to `./datasets/cmdd/prepare.py` and `./datasets/icdar13table/prepare.py` for you own dataset preparation.
## Citation
Please consider citing this work in your publications if it helps your research.
@inproceedings{xue2019res2tim,
title={ReS2TIM: Reconstruct Syntactic Structures from Table Images},
author={Xue, Wenyuan and Li, Qingyong and Tao, Dacheng},
booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)},
pages={749--755},
year={2019},
organization={IEEE}
}