Information regarding OCR process being used in this code

Hi, in practice, we used the modified PSENet for text detection and MASTER for recognition, which was trained on task-specific data in order to achieve ideal performance.

We didn't use other public methods. The general tools such as Tessaract didn't satisfy the needs of performance.

This repo is decoupled from the OCR system. In the inference phase, it only needs the results of the OCR system. But in the training phase, there have two training ways. One way is using the human-annotated label including boxes, transcripts, and corresponding entities. But this training way has a gap between the human-annotated label and OCR system in the inference phase, because the human-annotated boxes didn't match exactly with the OCR system due to the latent error of detection. To decrease this gap or inconsistent, we actually use another training way. We combine the human-annotated label with the results of the OCR system to get an OCR system-oriented IOB label for training. We first calculate the overlap between the human-annotated boxes and OCR results, then a simple rule is used to decide the final IOB label. A box segment is considered to contain an entity if the overlap of the box is bigger than a manually set threshold, which this part of the code didn't make public. Back to your question, different OCR APIs can be used in experiments. But the final performance is decided jointly by the number and difficulty of data, the performance of OCR, and the training strategy.

wenwenyu / PICK-pytorch

Information regarding OCR process being used in this code #67