mxin262 / SwinTextSpotter

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)
275 stars 42 forks source link

SwinTextSpotter

This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022). The paper is available at this link.

News

2024.04.09 We release a new text spotting pipeline Bridge Text Spotting that combines the advantages of end-to-end and two-step text spotting. Code

2023.08.22 We release a strong text spotting model ESTextSpotter that achieves explicit synergy on text spotting tasks. Code

Models

SWINTS-swin-english-pretrain [config] | model_Google Drive | model_BaiduYun PW: 954t

SWINTS-swin-Total-Text [config] | model_Google Drive | model_BaiduYun PW: tf0i

SWINTS-swin-ctw [config] | model_Google Drive | model_BaiduYun PW: 4etq

SWINTS-swin-icdar2015 [config] | model_Google Drive | model_BaiduYun PW: 3n82

SWINTS-swin-ReCTS [config] | model_Google Drive | model_BaiduYun PW: a4be

SWINTS-swin-vintext [config] | model_Google Drive | model_BaiduYun PW: slmp

Installation

Steps

  1. Install the repository (we recommend to use Anaconda for installation.)

    conda create -n SWINTS python=3.8 -y
    conda activate SWINTS
    conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
    pip install opencv-python
    pip install scipy
    pip install shapely
    pip install rapidfuzz
    pip install timm
    pip install Polygon3
    git clone https://github.com/mxin262/SwinTextSpotter.git
    cd SwinTextSpotter
    python setup.py build develop
  2. dataset path

    datasets
    |_ totaltext
    |  |_ train_images
    |  |_ test_images
    |  |_ totaltext_train.json
    |  |_ weak_voc_new.txt
    |  |_ weak_voc_pair_list.txt
    |_ mlt2017
    |  |_ train_images
    |  |_ annotations/icdar_2017_mlt.json
    .......

    Downloaded images

Downloaded label[Google Drive] [BaiduYun] PW: 46vd

Downloader lexicion[Google Drive] and place it to corresponding dataset.

You can also prepare your custom dataset following the example scripts. [example scripts]

Totaltext

To evaluate on Total Text, CTW1500, ICDAR2015, first download the zipped annotations and unzip it

  1. Pretrain SWINTS (e.g., with Swin-Transformer backbone)
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml
  1. Fine-tune model on the mixed real dataset
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-mixtrain.yaml
  1. Fine-tune model
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml
  1. Evaluate SWINTS (e.g., with Swin-Transformer backbone)

    python projects/SWINTS/train_net.py \
    --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
    --eval-only MODEL.WEIGHTS ./output/model_final.pth
  2. Visualize the detection and recognition results (e.g., with ResNet50 backbone)

    python demo/demo.py \
    --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
    --input input1.jpg \
    --output ./output \
    --confidence-threshold 0.4 \
    --opts MODEL.WEIGHTS ./output/model_final.pth

Example results:

Acknowlegement

Adelaidet, Detectron2, ISTR, SwinT_detectron2, Focal-Transformer and MaskTextSpotterV3.

Citation

If our paper helps your research, please cite it in your publications:

@article{huang2022swints,
  title = {SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition},
  author = {Mingxin Huang and YuLiang liu and Zhenghao Peng and Chongyu Liu and Dahua Lin and Shenggao Zhu and Nicholas Yuan and Kai Ding and Lianwen Jin},
  journal={arXiv preprint arXiv:2203.10209},
  year = {2022}
}

Copyright

For commercial purpose usage, please contact Dr. Lianwen Jin: eelwjin@scut.edu.cn

Copyright 2019, Deep Learning and Vision Computing Lab, South China China University of Technology. http://www.dlvc-lab.net