An OCR Toolbox for Vietnamese Documents
This toolbox provides a pipeline to do OCR in Vietnamese documents (such as receipts, personal id, licenses,...).
The project also support flexibility for adaptation.
:bookmark_tabs: More infomation:
- Report: link
- Youtube:
Invoice (from SROIE19 dataset)
Personal ID (image from internet)
Pipeline in detail:
- Use Canny Edge Detector and then detect contours.
- Extract receipt from image and normalize.
- Use Pixel Agreation Network (PAN) to detect text regions from extracted receipt, then crop these regions.
- Use VietOCR to extract texts from regions, then perform word correction.
- Retrieve information
Notebooks
-
Notebook for training PAN:
-
Notebook for training Transformer OCR:
-
Notebook for training PhoBERT:
-
Notebook for inference:
Pipeline
Main Pipeline
Process Flow Block
There are two stages (can also run in second stage only):
- The first stage is to detect and rectify document in the image, then forward through the "process flow" to find the best orientation of the document.
- The second stage is to forward the rotated image through the entire "process flow" normally to retrieve information
Datasets
Pretrained weights
- Pretrained PAN weights on SROIE19:
Model |
Image Size |
Weights |
MAP@0.5 |
Pixel accuracy |
IOU |
PAN (baseline) |
640 x 640 |
link |
0.71 |
0.95 |
0.91 |
PAN (rotation) |
640 x 640 |
link |
0.66 |
0.93 |
0.88 |
- Pretrained OCR weights on MCOCR2021:
Model |
Weights |
Accuracy (full seq) |
Accuracy (per char) |
Transformer OCR |
link |
0.890 |
0.981 |
- Pretrained PhoBERT weights on MCOCR2021:
Model |
Weights |
Accuracy (train) |
Accuracy (val) |
PhoBERT |
link |
0.978 |
0.924 |
Inference
References