mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
688 stars 125 forks source link

`linegen.distort_line` shows weird behaviour in about 1% of cases #538

Open sven-nm opened 10 months ago

sven-nm commented 10 months ago

I'm not sure this linegen.distort_line is supposed to be public/maintained, but every once in a while (about 1% of cases I reckon) it generates:

Here's grossly what I'm doing:

from PIL import ImageFilter
import random
from my_package import my_line_drawing_func

line_text = '180. γπαραλλαγή λέξεων — From Dindorf we read Hello world.'
image = my_line_drawing_func(line_text)  # This is a PIL image

for i in range(300):
    # Setting my custom distortion parameters
    distort = random.gauss(2.8, 0.5) 
    distort = max(min(distort, 4), 1.5)

    sigma = random.choices([9,10, 11], weights=[0.1, 0.8, 0.1])[0]

    eps = random.choices([random.uniform(0.01, 0.1), 0.03], weights=[0.2, 0.8])[0]

    distorted =distort_line(image, distort=distort, sigma=10, eps=eps)

    if not 0.85 * image.size[1] < distorted.size[1] < 1.7 * image.size[1]: # To avoid weird truncation and whiteboards
        print('Weird behaviour')
        distorted.show()
        continue

It's a bit hard to reproduce as there are many steps using random in distort_line itself and as you see, I found I lazy workaround.