mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
673 stars 125 forks source link

Segment command fails when trying to output as PageXML/ALTO #597

Closed fattynoparents closed 1 month ago

fattynoparents commented 2 months ago

When trying to use the segment command with output as PageXML/ALTO:

$ kraken -a -i ... segment -bl
$ kraken -x -i ... segment -bl

I get the error:

Segmenting      [04/25/24 11:06:55] 
ERROR    Failed processing /home/user/images/2024.03.07/1.jpg: 
'BaselineLine' object   kraken.py:429  has no attribute 'cuts'
fattynoparents commented 1 month ago

Version 5.2.4 gives me same error on Ubuntu 22.04 (WSL):

~$ kraken -x -i ~/images/test/79.jpg ~/images/test/79.xml segment -bl
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.1.2+cu121 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Loading ANN /home/user/.local/lib/python3.10/site-packages/kraken/blla.mlmodel  ✓
Segmenting      [05/14/24 15:28:58] ERROR    Failed processing /home/user/images/test/79.jpg: 'BaselineLine' object has no  attribute 'cuts'
kraken.py:431
mittagessen commented 1 month ago

On 24/05/14 07:14AM, fattynoparents wrote:

Version 5.2.4 gives me same error on Ubuntu 22.04 (WSL):

Yes, I haven't tagged a new release with the fix yet.

fattynoparents commented 5 days ago

I saw that you tagged the 5.2.5 release with my issue https://github.com/mittagessen/kraken/releases/tag/5.2.5 I'm now trying to run the code with a dev version 5.2.6.dev8 but get the following error:

(kraken) user@server:~/Documents/test$ kraken --version
kraken, version 5.2.6.dev8
(kraken) user@server:~/Documents/test$ kraken -x -i 7001.jpg 7001.xml segment -bl
Loading ANN /tmp/yes/envs/kraken/lib/python3.11/site-packages/kraken/blla.mlmodel       ✓
[07/03/24 14:21:12] ERROR  Failed processing 7001.jpg: 
kraken.py:431 /tmp/yes/envs/kraken/lib/python3.11/site-packages/pyarrow/../../.././libbrotlidec.so.1: undefined
                             symbol: BrotliSharedDictionaryDestroyInstance
dstoekl commented 5 days ago

here is a workaround: https://stackoverflow.com/questions/55051431/linux-pyarrow-undefined-symbol

mittagessen commented 5 days ago

On 24/07/03 05:24AM, fattynoparents wrote:

I saw that you tagged the 5.2.5 release with my issue https://github.com/mittagessen/kraken/releases/tag/5.2.5 I'm now trying to run the code with a dev version 5.2.6.dev8 but get the following error:

(kraken) ***@***.***:~/Documents/test$ kraken --version
kraken, version 5.2.6.dev8
(kraken) ***@***.***:~/Documents/test$ kraken -x -i 7001.jpg 7001.xml segment -bl
Loading ANN /tmp/yes/envs/kraken/lib/python3.11/site-packages/kraken/blla.mlmodel       ✓
[07/03/24 14:21:12] ERROR  Failed processing 7001.jpg: 
kraken.py:431 /tmp/yes/envs/kraken/lib/python3.11/site-packages/pyarrow/../../.././libbrotlidec.so.1: undefined
                             symbol: BrotliSharedDictionaryDestroyInstance

That's not a kraken issue. You managed to install the pyarrow dependency with an incompatible libbrotlidec version. I'd purge the environment and try again.