Closed n0ct4li closed 4 years ago
Apply any OCR tool that help you detecting and recognizing words in the scanned document image. For example, refer to what @4kssoft has done to the document image and generated a .json file with position and text of the image. https://github.com/4kssoft/CUTIE/blob/master/invoice_data/Faktura1.pdf_0.json
For generating the input data, you have to know which Bounding box belong to each field. For his own dataset the author says : ". Each text and their bounding box is manually labelled as one of the 9 different classes". But how can we do this for SROIE? We don't have the Bounding box ground truth of each fied..