wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
https://arxiv.org/abs/2004.07464
MIT License
560 stars 193 forks source link

tool used in preparing training data ? #86

Open ziodos opened 3 years ago

ziodos commented 3 years ago

can you provide some details about the method you used to prepare training data , I think you didn't use a classic ocr tool, thanks.

knitemblazor commented 3 years ago

you have to use an ocr tool. just mapping the text with corresponding label is an issue i would suggest you use labelImg to get the region and then use overlapping text region to make corresponding labels.

ziodos commented 3 years ago

I am using tesseract as text detection and text recognition tool , the author said that it wasn't good for result accuracy , I still don't know why

NeerajAI commented 3 years ago

There are issues in tesseract , it does not work with complex document structure and ocr also fails some time.