This repository contains the source code for the ACM Multimedia 2023 paper "Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution", arxiv link.
Text Image Super-Resolution, super-resolution reconstruction for text images. The reconstruction results of RTSRN on the TextZoom dataset are as follows:
Our model compared to other SOTA models on the TextZoom dataset. The values in the table are the accuracy of text recognition, where CRNN, MORAN and ASTER are three different text recognizers.
The last three lines are a comparison of multi-stage models, and our model has greatly improved performance after multi-stage training.
Our model is as follows:
Please refer to the following simple steps for installation.
git clone https://github.com/wenyu1009/RTSRN.git
cd RTSRN
conda env create -f environment.yml
conda activate rtsrn
textzoom Change TRAIN.train_data_dir to your train data path. Change TRAIN.VAL.val_data_dir to your val data path.
Download the Aster model from https://github.com/ayumiymk/aster.pytorch, Moran model from https://github.com/Canjie-Luo/MORAN_v2, CRNN model from https://github.com/meijieru/crnn.pytorch. Change TRAIN.VAL.rec_pretrained in ./configs/super_resolution.yaml to your Aster model path, change TRAIN.VAL.moran_pretrained to your MORAN model path and change TRAIN.VAL.crnn_pretrained to your CRNN model path.
please refer to STT the path of pkl in weight_ce_loss.py the path of pth in text_focus_loss.py Change these path for yourself.
CUDA_VISIBLE_DEVICES=0 python3 main.py --arch="rtsrn" --test_model="CRNN" --batch_size=48 --STN --sr_share --gradient --use_distill --stu_iter=1 --vis_dir='test' --mask --triple_clues --text_focus --lca
We can do this:
nohup sh train.sh > log/train_result 2>&1 &
CUDA_VISIBLE_DEVICES=1 python3 main.py --arch="rtsrn" --test_model="CRNN" --batch_size=48 --STN --sr_share --gradient --use_distill --stu_iter=1 --vis_dir='vis/test' --mask --go_test --resume='ckpt/c3stisr-all-warmup-2ratio' --triple_clues --text_focus --lca --vis
We can do this:
nohup sh test.sh > log/test_result 2>&1 &
Use this command to test the results of CRNN in the performance comparison table_ The corresponding results can be obtained by replacing the model with ASTER or MORAN.
CUDA_VISIBLE_DEVICES=0 python3 main.py --arch="rtsrn" --test_model="CRNN" --batch_size=48 --STN --gradient --use_distill --stu_iter=3 --vis_dir='xxx' --mask --triple_clues --text_focus --lca
If three-stage training is used, -- stu=3 and -- sr_share are removed
CUDA_VISIBLE_DEVICES=0 python3 main.py --arch="rtsrn" --test_model="CRNN" --batch_size=48 --STN --gradient --use_distill --stu_iter=3 --vis --vis_dir='vis/xxx' --mask --go_test --resume='ckpt/xxx/' --triple_clues --text_focus --lca
After getting the model after the three-stage training, it is consistent with the training, -- stu=3 and -- sr share。 This command is in test Multi.sh, after use, the performance comparison table can be obtained to summarize the results of CRNN, and test can be replaced_ Model can obtain the results of two other text recognizers.
· Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution [Paper] [Code]
· A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution [Paper] [Code]
· Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Paper] [Code]
· Text Prior Guided Scene Text Image Super-resolution [Paper] [Code]
· C3-STISR: Scene Text Image Super-resolution with Triple Clues [Paper] [Code]