mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
720 stars 130 forks source link

long Segmenting process #231

Closed josef821 closed 3 years ago

josef821 commented 3 years ago

i compile kraken with anaconda and run this command : kraken -i image.png image.txt binarize segment ocr -m enbest.mlmodel output : [0.0025] Baseline model (/home/mypc/anaconda3/envs/kraken/lib/python3.8/site-packages/kraken/blla.mlmodel) given but legacy segmenter selected. Forcing to -bl. Loading ANN /home/mypc/anaconda3/envs/kraken/lib/python3.8/site-packages/kraken/blla.mlmodel ✓ Loading ANN default ✓ Binarizing ✓ Segmenting _

it will hold almost 30 to 40 second on sementing and ocr will be very slow. what should i do?

kba commented 3 years ago

If you know that the work is single-column, you can set maxcolseps to 0, that speeds up segmentation in my experience.

dstoekl commented 3 years ago

Why dont you want to train the new segmenter?


De : Konstantin Baierer notifications@github.com Envoyé : jeudi 7 janvier 2021 10:23 À : mittagessen/kraken kraken@noreply.github.com Cc : Subscribed subscribed@noreply.github.com Objet : Re: [mittagessen/kraken] long Segmenting process (#231)

If you know that the work is single-column, you can set maxcolseps to 0, that speeds up segmentation in my experience.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mittagessen/kraken/issues/231#issuecomment-755993852, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJNUE32DITDWJP4KYYIZFB3SYV4SNANCNFSM4VYVWQQA.

josef821 commented 3 years ago

Error: no such option: maxcolseps i want to ocr very simple image with two or more line . no column and no image. in 2.x version segmentation was easy. Update : i replace with kraken-3.0b12 it works like old version now.

kba commented 3 years ago

Error: no such option: maxcolseps

It's an option of the segment CLI, i.e.

kraken -i image.png image.txt binarize segment --maxcolseps 0 ocr -m en_best.mlmodel
kba commented 3 years ago

i replace with kraken-3.0b12 it works like old version now.

Most recent version is 3.0b18 btw.

josef821 commented 3 years ago

Error: no such option: maxcolseps

It's an option of the segment CLI, i.e.

kraken -i image.png image.txt binarize segment --maxcolseps 0 ocr -m en_best.mlmodel

it still wait 30 to 40 second on segmenting. it show this warning after segmention : WARNING:kraken.rpred:Recognizers with segmentation types {'bbox'} will be applied to segmentation of type baselines. This will likely result in severely degraded performace

mittagessen commented 3 years ago

it still wait 30 to 40 second on segmenting.

I did a stupid in November that's why. kraken is defaulting to the new trainable segmenter and the legacy one couldn't be selected anymore. There's a hotfix in 3.0b19. Although I'm not sure why the new segmenter is slower for you than the old one, generally it is a bit faster if there's a 'normal' amount of lines on the page and you've got sufficient free memory (~4Gb).

josef821 commented 3 years ago

some image have 1 and some 10 very simple line. my pc spec: cpu:i7-6700 ram:32G graphic:1070 - 8G up to 16G hdd:500G SSD

dirkroorda commented 3 years ago

I also encountered long segmenting times and that's why I do the segmentatio myself, in opencv.