openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
931 stars 152 forks source link

Confidence score #58

Open MathieuCliche opened 7 years ago

MathieuCliche commented 7 years ago

Is it possible to get a confidence score for the predictions (not orientation) ?

jflesch commented 7 years ago

You mean one confidence score for the OCR on the whole image ? I'm not even sure whether Tesseract provides such score.

MathieuCliche commented 7 years ago

Yeah, for the whole image, or "per words". From what I read, tit's possible to get it from the hocr or tsv output. You can check it our here : https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github

For example, the TSV output has a column "conf", which gives the confidence for each word.

jflesch commented 7 years ago

Ok, good to know. For the words, I guess it can be added as an attribute to pyocr.builders.Box objects. Regarding the whole, with the current API, it's going to be a little more complicated ...

jflesch commented 6 years ago

Per words, you can say thanks to @a-pagano : https://github.com/openpaperwork/pyocr/pull/86 :-)

jflesch commented 6 years ago

Sorry, I meant to keep this ticket opened regarding the confidence score for the whole page.

jflesch commented 6 years ago

Changes of @a-pagano have been released in Pyocr 0.5