issues
search
osuossu8
/
paper-reading
6
stars
1
forks
source link
[2021] Vision Transformer for Fast and Efficient Scene Text Recognition
#35
Open
osuossu8
opened
1 year ago
osuossu8
commented
1 year ago
https://arxiv.org/pdf/2105.08582.pdf
osuossu8
commented
1 year ago
Train Dataset
use synthetic data
MJSynth (MJ)
8.9M
1,400 different fonts
SynthText (ST)
5.5M
In the STR framework, each dataset contributes 50% to the total train dataset. Combining 100% of both datasets resulted to performance deterioration
osuossu8
commented
1 year ago
Setting
224 × 224 image for DeiT
2080Ti GPU
Data Augmentation :
RandAugment
https://arxiv.org/pdf/2105.08582.pdf