mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
673 stars 125 forks source link

Kraken 4 does not ocr htr #571

Closed Hoekman57 closed 3 months ago

Hoekman57 commented 5 months ago

I trained kraken 4 in the terminal and in escriptorium I also have 40 handwritten Dutch documents When I train these in Kraken 3.0.13 I get a tested result of 93% When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8% but when I test a page in escriptorium on a page of 24 rules/lines then I see 24 times the word een and another time I saw 23 times EEE So no recognition. When I try this in my terminal I get the same result. I transcribed these alto and tif in escriptorium and exported them and use the command: ketos train --lag 25 --device cuda:0 -f alto *.xml

I have" A NVIDIA 3090 card and Ubuntu 20.04 I installed all with GPU/Cuda and a 1 TB HDD

I hope someone can really hep me

Hoekman57 commented 5 months ago

Because I could not edit my text I place it in this comment WHEN I use ketos train --lag 25 --device cuda:0 -f alto *.xml in the terminal then I see the following scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API. GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Trainer(val_check_interval=1.0) was configured so validation will run at the end of the training epoch.. You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

And NO VERSION of scikit-learn I try is compatible I tried it all

mittagessen commented 5 months ago

And NO VERSION of scikit-learn I try is compatible I tried it all

That warning can safely be ignored. It originates from coreml whose conversion functionality we don't use.

When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8%

There was a change of default architecture between 3.x and 4.x. The new architecture usually needs a bit longer to converge but it shouldn't fail completely like in your case. We had an issue with certain pytorch versions causing silent training failures (for reasons we were never able to track down). The development branch of kraken (main) pins all requirements so it shouldn't happen again. Could you do a clean install (new virtual environment) of it and rerun the training?

If that doesn't fix it could you test the old architecture with -s '[1,48,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]' in ketos train? If that converges and the new one doesn't that would indicate that maybe something is awry with your dataset.

Hoekman57 commented 5 months ago

And NO VERSION of scikit-learn I try is compatible I tried it all

That warning can safely be ignored. It originates from coreml whose conversion functionality we don't use.

When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8%

There was a change of default architecture between 3.x and 4.x. The new architecture usually needs a bit longer to converge but it shouldn't fail completely like in your case. We had an issue with certain pytorch versions causing silent training failures (for reasons we were never able to track down). The development branch of kraken (main) pins all requirements so it shouldn't happen again. Could you do a clean install (new virtual environment) of it and rerun the training?

If that doesn't fix it could you test the old architecture with -s '[1,48,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]' in ketos train? If that converges and the new one doesn't that would indicate that maybe something is awry with your dataset.

The old architecture worked At stage 24 I already have 90% That' s great thanks you BUT that still doesn't solve the real problem because Escriptorium does not use the old architecture so the 'train button' in Escr. does not do the job properly What can be done about that so the new architecture will also work without the old architecture code?

Hoekman57 commented 5 months ago

And NO VERSION of scikit-learn I try is compatible I tried it all

That warning can safely be ignored. It originates from coreml whose conversion functionality we don't use.

When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8%

There was a change of default architecture between 3.x and 4.x. The new architecture usually needs a bit longer to converge but it shouldn't fail completely like in your case. We had an issue with certain pytorch versions causing silent training failures (for reasons we were never able to track down). The development branch of kraken (main) pins all requirements so it shouldn't happen again. Could you do a clean install (new virtual environment) of it and rerun the training?

If that doesn't fix it could you test the old architecture with -s '[1,48,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]' in ketos train? If that converges and the new one doesn't that would indicate that maybe something is awry with your dataset.

This does not solve the problem that Escriptorium also uses kraken 4 with the new architecture Which has zero result Can this be solved? Or can I change something in escriptorium so that it will also use the old architecture which I now use in the terminal?

mittagessen commented 5 months ago

Could you please still test the current main version of kraken? There are workarounds for eScriptorium to use arbitrary architectures but it would be better to figure out why it doesn't converge than circumventing the issue.

Hoekman57 commented 5 months ago

You mean with the old architecture you sent me Because installling it in a new environment I did about 30 times with all kinds of python versions 3.8 to 3.12 with all kraken 4 versions I tried your installation manual with Conda {anaconda en miniconda) en with pip No result with any kraken 4 version in the manual installation in your GitHub and in kraken re the environment stay’s in a loop and does not install easy So please if you have any advise or help Or you can tell me the right steps or versions That’s very welcome I will test it all

Verzonden vanuit Outlook voor iOShttps://aka.ms/o0ukef


Van: mittagessen @.> Verzonden: Wednesday, February 7, 2024 5:54:06 PM Aan: mittagessen/kraken @.> CC: Jannes Hoekman @.>; Author @.> Onderwerp: Re: [mittagessen/kraken] Kraken 4 does not ocr htr (Issue #571)

Could you please still test the current main version of kraken? There are workarounds for eScriptorium to use arbitrary architectures but it would be better to figure out why it doesn't converge than circumventing the issue.

— Reply to this email directly, view it on GitHubhttps://github.com/mittagessen/kraken/issues/571#issuecomment-1932475622, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANHNBOCZTWAG5NXS3IBAK2LYSOWS5AVCNFSM6AAAAABC4F2IAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGQ3TKNRSGI. You are receiving this because you authored the thread.Message ID: @.***>

Hoekman57 commented 5 months ago

And NO VERSION of scikit-learn I try is compatible I tried it all

That warning can safely be ignored. It originates from coreml whose conversion functionality we don't use.

When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8%

There was a change of default architecture between 3.x and 4.x. The new architecture usually needs a bit longer to converge but it shouldn't fail completely like in your case. We had an issue with certain pytorch versions causing silent training failures (for reasons we were never able to track down). The development branch of kraken (main) pins all requirements so it shouldn't happen again. Could you do a clean install (new virtual environment) of it and rerun the training?

If that doesn't fix it could you test the old architecture with -s '[1,48,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]' in ketos train? If that converges and the new one doesn't that would indicate that maybe something is awry with your dataset.

This does not solve the problem that Escriptorium also uses kraken 4 with the new architecture Which has zero result Can this be solved? Or can I change something in escriptorium so that it will also use the old architecture which I now use in the terminal?

Could you please still test the current main version of kraken? There are workarounds for eScriptorium to use arbitrary architectures but it would be better to figure out why it doesn't converge than circumventing the issue.

I got this error message: ModuleNotFoundError: No module named 'torch._custom_ops' ketos train --lag 5 --device cuda:0 -f alto *.xml

Hoekman57 commented 5 months ago

Could you please still test the current main version of kraken? There are workarounds for eScriptorium to use arbitrary architectures but it would be better to figure out why it doesn't converge than circumventing the issue.

I get this error after installing the main: OSError: /home/biqe/.local/lib/python3.8/site-packages/torchaudio/torchaudio/lib/libtorchaudio.so: undefined symbol

mittagessen commented 5 months ago

Can you show me what exactly you were doing to install main? A simple pip install -U from an existing environment is liable to cause a broken install like this. You should create a new virtual environment for this, ideally anaconda from the environment files in the repository:

conda env create -f environment_cuda.yml
Hoekman57 commented 5 months ago

Can you show me what exactly you were doing to install main? A simple pip install -U from an existing environment is liable to cause a broken install like this. You should create a new virtual environment for this, ideally anaconda from the environment files in the repository:

conda env create -f environment_cuda.yml

pip install git+https://github.com/mittagessen/kraken.git@main

Hoekman57 commented 5 months ago

Screenshot from 2024-02-09 13-40-09

mittagessen commented 5 months ago

You need to run the conda env instantiation from the git repository as it installs kraken from the local code base and not a remote source.