Closed Hoekman57 closed 3 months ago
Because I could not edit my text I place it in this comment
WHEN I use ketos train --lag 25 --device cuda:0 -f alto *.xml in the terminal then I see the following
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Trainer(val_check_interval=1.0)
was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high')
which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
And NO VERSION of scikit-learn I try is compatible I tried it all
And NO VERSION of scikit-learn I try is compatible I tried it all
That warning can safely be ignored. It originates from coreml whose conversion functionality we don't use.
When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8%
There was a change of default architecture between 3.x and 4.x. The new architecture usually needs a bit longer to converge but it shouldn't fail completely like in your case. We had an issue with certain pytorch versions causing silent training failures (for reasons we were never able to track down). The development branch of kraken (main
) pins all requirements so it shouldn't happen again. Could you do a clean install (new virtual environment) of it and rerun the training?
If that doesn't fix it could you test the old architecture with -s '[1,48,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]'
in ketos train
? If that converges and the new one doesn't that would indicate that maybe something is awry with your dataset.
And NO VERSION of scikit-learn I try is compatible I tried it all
That warning can safely be ignored. It originates from coreml whose conversion functionality we don't use.
When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8%
There was a change of default architecture between 3.x and 4.x. The new architecture usually needs a bit longer to converge but it shouldn't fail completely like in your case. We had an issue with certain pytorch versions causing silent training failures (for reasons we were never able to track down). The development branch of kraken (
main
) pins all requirements so it shouldn't happen again. Could you do a clean install (new virtual environment) of it and rerun the training?If that doesn't fix it could you test the old architecture with
-s '[1,48,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]'
inketos train
? If that converges and the new one doesn't that would indicate that maybe something is awry with your dataset.
The old architecture worked At stage 24 I already have 90% That' s great thanks you BUT that still doesn't solve the real problem because Escriptorium does not use the old architecture so the 'train button' in Escr. does not do the job properly What can be done about that so the new architecture will also work without the old architecture code?
And NO VERSION of scikit-learn I try is compatible I tried it all
That warning can safely be ignored. It originates from coreml whose conversion functionality we don't use.
When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8%
There was a change of default architecture between 3.x and 4.x. The new architecture usually needs a bit longer to converge but it shouldn't fail completely like in your case. We had an issue with certain pytorch versions causing silent training failures (for reasons we were never able to track down). The development branch of kraken (
main
) pins all requirements so it shouldn't happen again. Could you do a clean install (new virtual environment) of it and rerun the training?If that doesn't fix it could you test the old architecture with
-s '[1,48,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]'
inketos train
? If that converges and the new one doesn't that would indicate that maybe something is awry with your dataset.
This does not solve the problem that Escriptorium also uses kraken 4 with the new architecture Which has zero result Can this be solved? Or can I change something in escriptorium so that it will also use the old architecture which I now use in the terminal?
Could you please still test the current main
version of kraken? There are workarounds for eScriptorium to use arbitrary architectures but it would be better to figure out why it doesn't converge than circumventing the issue.
You mean with the old architecture you sent me Because installling it in a new environment I did about 30 times with all kinds of python versions 3.8 to 3.12 with all kraken 4 versions I tried your installation manual with Conda {anaconda en miniconda) en with pip No result with any kraken 4 version in the manual installation in your GitHub and in kraken re the environment stay’s in a loop and does not install easy So please if you have any advise or help Or you can tell me the right steps or versions That’s very welcome I will test it all
Verzonden vanuit Outlook voor iOShttps://aka.ms/o0ukef
Van: mittagessen @.> Verzonden: Wednesday, February 7, 2024 5:54:06 PM Aan: mittagessen/kraken @.> CC: Jannes Hoekman @.>; Author @.> Onderwerp: Re: [mittagessen/kraken] Kraken 4 does not ocr htr (Issue #571)
Could you please still test the current main version of kraken? There are workarounds for eScriptorium to use arbitrary architectures but it would be better to figure out why it doesn't converge than circumventing the issue.
— Reply to this email directly, view it on GitHubhttps://github.com/mittagessen/kraken/issues/571#issuecomment-1932475622, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANHNBOCZTWAG5NXS3IBAK2LYSOWS5AVCNFSM6AAAAABC4F2IAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGQ3TKNRSGI. You are receiving this because you authored the thread.Message ID: @.***>
And NO VERSION of scikit-learn I try is compatible I tried it all
That warning can safely be ignored. It originates from coreml whose conversion functionality we don't use.
When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8%
There was a change of default architecture between 3.x and 4.x. The new architecture usually needs a bit longer to converge but it shouldn't fail completely like in your case. We had an issue with certain pytorch versions causing silent training failures (for reasons we were never able to track down). The development branch of kraken (
main
) pins all requirements so it shouldn't happen again. Could you do a clean install (new virtual environment) of it and rerun the training?If that doesn't fix it could you test the old architecture with
-s '[1,48,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]'
inketos train
? If that converges and the new one doesn't that would indicate that maybe something is awry with your dataset.
This does not solve the problem that Escriptorium also uses kraken 4 with the new architecture Which has zero result Can this be solved? Or can I change something in escriptorium so that it will also use the old architecture which I now use in the terminal?
Could you please still test the current
main
version of kraken? There are workarounds for eScriptorium to use arbitrary architectures but it would be better to figure out why it doesn't converge than circumventing the issue.
I got this error message: ModuleNotFoundError: No module named 'torch._custom_ops' ketos train --lag 5 --device cuda:0 -f alto *.xml
Could you please still test the current
main
version of kraken? There are workarounds for eScriptorium to use arbitrary architectures but it would be better to figure out why it doesn't converge than circumventing the issue.
I get this error after installing the main: OSError: /home/biqe/.local/lib/python3.8/site-packages/torchaudio/torchaudio/lib/libtorchaudio.so: undefined symbol
Can you show me what exactly you were doing to install main? A simple pip install -U
from an existing environment is liable to cause a broken install like this. You should create a new virtual environment for this, ideally anaconda from the environment files in the repository:
conda env create -f environment_cuda.yml
Can you show me what exactly you were doing to install main? A simple
pip install -U
from an existing environment is liable to cause a broken install like this. You should create a new virtual environment for this, ideally anaconda from the environment files in the repository:conda env create -f environment_cuda.yml
pip install git+https://github.com/mittagessen/kraken.git@main
You need to run the conda env instantiation from the git repository as it installs kraken from the local code base and not a remote source.
I trained kraken 4 in the terminal and in escriptorium I also have 40 handwritten Dutch documents When I train these in Kraken 3.0.13 I get a tested result of 93% When I train the same documents in whatever version of kraken 4 in the terminal or in escriptorium the result is almost zero 5 to 8% but when I test a page in escriptorium on a page of 24 rules/lines then I see 24 times the word een and another time I saw 23 times EEE So no recognition. When I try this in my terminal I get the same result. I transcribed these alto and tif in escriptorium and exported them and use the command: ketos train --lag 25 --device cuda:0 -f alto *.xml
I have" A NVIDIA 3090 card and Ubuntu 20.04 I installed all with GPU/Cuda and a 1 TB HDD
I hope someone can really hep me