openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
930 stars 152 forks source link

Builders : bad use of class attributes #48

Closed bnguyenvanyen closed 7 years ago

bnguyenvanyen commented 7 years ago

Ok I'm pretty much done for the digit builders, but I stumbled on what I think is a bug. The builders have lists as class attributes -- file_extensions, tesseract_configs, cuneiform_args -- and at init these lists are appended to, so that :

a = TextBuilder()
b = TextBuilder()
c = TextBuilder()
print(TextBuilder.tesseract_configs)

prints ['-psm', '3', '-psm', '3', '-psm', '3']

But there's worse. Since DigitBuilder inherits from TextBuilder and appends "digits" to tesseract_configs, any subsequent call to TextBuilder interprets the input as digits -- this was caught in tests, so they're useful :)

Proposed fixes :

Also ideally those attributes should be documented.

jflesch commented 7 years ago

Good catch :)

jflesch commented 7 years ago

Actually, I don't think there is a real need for dict here. Just switching to instance attributes (which it should have been from the start) is enough.

jflesch commented 7 years ago

Hm, this bug has been fixed by your latest push request if I remember correctly ?

bnguyenvanyen commented 7 years ago

Yep, it's fixed ! Thanks