get text position with OCR engine

williamfzc commented 5 years ago

事实上 tesseract 提供了获取 box 的API，这使得获取文字所在位置成为可能。

tesserocr也提供了相应的API：

from PIL import Image
import cv2
from tesserocr import PyTessBaseAPI, RIL

IMAGE_PATH = r'tests/pics/screen.png'

image = Image.open(IMAGE_PATH)
cv2_image = cv2.imread(IMAGE_PATH)

with PyTessBaseAPI(lang='eng+chi_sim') as api:
    api.SetImage(image)
    print(api.AllWordConfidences())
    boxes = api.GetComponentImages(RIL.PARA, True)
    print('Found {} textline image components.'.format(len(boxes)))
    for im, box, *_ in boxes:
        api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
        ocrResult = api.GetUTF8Text()
        print(f'ocr result: {ocrResult}')
        print(f'box: {box}')
        cv2.rectangle(cv2_image, (box['x'], box['y']), (box['x'] + box['w'], box['y'] + box['h']), (255, 0, 0), 5)

cv2.imshow("Image", cv2_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

williamfzc commented 5 years ago

TextLine: haha

Word: haha1

但事实上，tesseract能够发现有字，但是不一定能识别出字的内容：

Found 14 textline image components.
ocr result: 
box: {'x': 33, 'y': 208, 'w': 91, 'h': 31}
ocr result: 
box: {'x': 129, 'y': 210, 'w': 29, 'h': 28}
ocr result: oa

box: {'x': 477, 'y': 208, 'w': 60, 'h': 31}
ocr result:  

box: {'x': 683, 'y': 210, 'w': 29, 'h': 28}
ocr result: 
box: {'x': 715, 'y': 208, 'w': 61, 'h': 31}
ocr result: 
box: {'x': 857, 'y': 208, 'w': 61, 'h': 31}
ocr result: 中 ，

box: {'x': 922, 'y': 208, 'w': 36, 'h': 31}
ocr result: 
box: {'x': 960, 'y': 208, 'w': 22, 'h': 30}
ocr result: 
box: {'x': 33, 'y': 489, 'w': 61, 'h': 31}
ocr result: 
box: {'x': 97, 'y': 489, 'w': 61, 'h': 31}
ocr result: 
box: {'x': 271, 'y': 492, 'w': 60, 'h': 28}
ocr result: 
box: {'x': 461, 'y': 494, 'w': 18, 'h': 18}
ocr result: 
box: {'x': 495, 'y': 489, 'w': 40, 'h': 31}
ocr result:  

box: {'x': 535, 'y': 489, 'w': 21, 'h': 31}

williamfzc commented 5 years ago

Some points to improve OCR accuracy:

https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy

williamfzc commented 5 years ago

移交到 findtext 项目处理

williamfzc / findit

get text position with OCR engine #12