mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
673 stars 125 forks source link

kraken segment command hanging #590

Closed fattynoparents closed 2 months ago

fattynoparents commented 2 months ago

I have installed kraken to my wsl using the $ pip install kraken command. I am mostly interested in the segmentation process, mainly if kraken can produce mask files with baselines in them. I tried investigating how segmentation works, in the tutorial I found this info:

There is a default model that works reasonably well on printed and handwritten material on undegraded, 
even writing surfaces such as paper or parchment. The output of this model consists of a single line type 
and a generic text region class that denotes coherent blocks of text. This model is employed automatically 
when the baseline segment is activated with the -bl option:

kraken -i input.jpg segmentation.json segment -bl

but when I try to run this command on one of my pictures nothing happens at all. I get no error and no output, it just hangs there until I abort the command. What am I doing wrong? Thanks a lot in advance!

clovis commented 2 months ago

I also get the same issue when importing kraken in a Python script. It just hangs. I tried from iPython, and importing kraken also hangs.

fattynoparents commented 2 months ago

I also get the same issue when importing kraken in a Python script. It just hangs. I tried from iPython, and importing kraken also hangs.

Did you find a solution?

mittagessen commented 2 months ago

I know some people used WSL successfully in the past but we officially don't support Windows and I don't have access to any machine with which I could verify if kraken works or doesn't work with Windows/WSL.

You could increase the verbosity with -v to get a bit more output and see where exactly in the process it hangs but the segmenter is a complex beast and blind remote-debugging is unlikely to lead to much.

clovis commented 2 months ago

Solved by upgrading to the newest release. I am running Ubuntu 22.03, so this was not a WIndows issue on my end.

fattynoparents commented 2 months ago

Thanks everyone, upgrading to the latest version has solved my issue as well, so it wasn't a problem of WSL.

mittagessen commented 2 months ago

Interesting. There's was a regression in 5.0 with segmentation but that caused crashes not hangs. Anyway, good to see it has been resolved.

fattynoparents commented 2 months ago

Interesting. There's was a regression in 5.0 with segmentation but that caused crashes not hangs. Anyway, good to see it has been resolved.

Yes seems in WSL it didn't give any visible crash.

One more question, unrelated, when running only segmentation in kraken is the only output a JSON file? Or is there a possibility to get a PageXML file? Or convert the resulting JSON to PageXML? Thanks!

mittagessen commented 2 months ago

On 24/04/24 06:33AM, fattynoparents wrote:

Interesting. There's was a regression in 5.0 with segmentation but that caused crashes not hangs. Anyway, good to see it has been resolved.

Yes seems in WSL it didn't give any visible crash. One more question, unrelated, when running only segmentation in kraken is the only output a JSON file? Or is there a possibility to get a PageXML file? Or convert the resulting JSON to PageXML? Thanks!

You can run any serializer on segmentation output only as well:

$ kraken -a -i ... segment -bl
$ kraken -x -i ... segment -bl

produces ALTO/PageXML respectively.

fattynoparents commented 2 months ago

You can run any serializer on segmentation output only as well: $ kraken -a -i ... segment -bl $ kraken -x -i ... segment -bl produces ALTO/PageXML respectively.

Thanks for your reply! However, both these commands give me the following error:

Segmenting      [04/25/24 11:06:55] 
ERROR    Failed processing /home/user/images/2024.03.07/1.jpg: 
'BaselineLine' object   kraken.py:429  has no attribute 'cuts'