mxin262 / ESTextSpotter

(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
72 stars 7 forks source link

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

This is the pytorch implementation of Paper: ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer (ICCV 2023). The paper is available at this link.

News

2024.04.09 We release a new text spotting pipeline Bridge Text Spotting that combines the advantages of end-to-end and two-step text spotting. Code

2023.07.21 Code is available.

Getting Started

Data Preparation

Please download TotalText, CTW1500, MLT, ICDAR2013, ICDAR2015, and CurvedSynText150k according to the guide provided by SPTS v2: README.md.

Please download the MLT 2019 in Images / Annotations.

Extract all the datasets and make sure you organize them as follows

- datasets
  | - CTW1500
  |   | - annotations
  |   | - ctwtest_text_image
  |   | - ctwtrain_text_image
  | - totaltext (or icdar2015)
  |   | - test_images
  |   | - train_images
  |   | - test.json
  |   | - train.json
  | - mlt2017 (or syntext1, syntext2)
      | - annotations
      | - images

Model Zoo

Dataset Det-P Det-R Det-F1 E2E-None E2E-Full Weights
Pretrain 90.7 85.3 87.9 73.8 85.5 OneDrive
Total-Text 91.8 88.2 90.0 80.9 87.1 OneDrive
CTW1500 91.3 88.6 89.9 65.0 83.9 OneDrive
Dataset Det-P Det-R Det-F1 E2E-S E2E-W E2E-G Weights
ICDAR2015 95.1 88 91.4 88.5 83.1 78.1 OneDrive
Dataset H-mean Weights
VinText 73.6 OneDrive
Dataset Det-P Det-R Det-H 1-NED Weights
ICDAR 2019 ReCTS 94.1 91.3 92.7 78.1 OneDrive
Dataset R P H AP Arabic Latin Chinese Japanese Korean Bangla Hindi Weights
MLT 75.5 83.37 79.24 72.52 52.00 77.34 48.20 48.42 63.56 38.26 50.83 OneDrive

Training

We use 8 GPUs for training and 2 images each GPU by default.

  1. Pretrain

    bash scripts/Pretrain.sh /path/to/your/dataset
  2. Fine-tune model on the mixed real dataset

bash scripts/Joint_train.sh /path/to/your/dataset
  1. Fine-tune model
bash scripts/TT_finetune.sh /path/to/your/dataset

Evaluation

0 for Text Detection; 1 for Text Spotting.

bash scripts/test.sh config/ESTS/ESTS_5scale_tt_finetune.py /path/to/your/dataset 1 /path/to/your/checkpoint /path/to/your/test dataset

e.g.:

bash scripts/test.sh config/ESTS/ESTS_5scale_tt_finetune.py ../datasets 1 totaltext_checkpoint.pth totaltext_val

Visualization

Visualize the detection and recognition results

python vis.py

Example Results:

Copyright

This repository can only be used for non-commercial research purpose.

For commercial use, please contact Prof. Lianwen Jin (eelwjin@scut.edu.cn).

Copyright 2023, Deep Learning and Vision Computing Lab, South China University of Technology.

Acknowlegement

AdelaiDet, DINO, Detectron2, TESTR

Citation

If our paper helps your research, please cite it in your publications:


@InProceedings{Huang_2023_ICCV,
    author    = {Huang, Mingxin and Zhang, Jiaxin and Peng, Dezhi and Lu, Hao and Huang, Can and Liu, Yuliang and Bai, Xiang and Jin, Lianwen},
    title     = {ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {19495-19505}
}