Incidental scene text spotting is considered one of the
most difficult and valuable challenges in the document anal-
ysis community. Most existing methods treat text detec-
tion and recognition as separate tasks. In this work, we
propose a unified end-to-end trainable Fast Oriented Text
Spotting (FOTS) network for simultaneous detection and
recognition, sharing computation and visual information
among the two complementary tasks. Specially, RoIRotate
is introduced to share convolutional features between de-
tection and recognition. Benefiting from convolution shar-
ing strategy, our FOTS has little computation overhead
compared to baseline text detection network, and the joint
training method learns more generic features to make our
method perform better than these two-stage methods. Ex-
periments on ICDAR 2015, ICDAR 2017 MLT, and ICDAR
2013 datasets demonstrate that the proposed method out-
performs state-of-the-art methods significantly, which fur-
ther allows us to develop the first real-time oriented text
spotting system which surpasses all previous state-of-the-
art results by more than 5% on ICDAR 2015 text spotting
task while keeping 22.6 fps.
https://arxiv.org/pdf/1801.01671.pdf