mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
750 stars 131 forks source link

Error initiating recognition training, kraken 4.13.3 #552

Closed hlapin closed 1 year ago

hlapin commented 1 year ago

[In case it matters, I am running kraken in a colab environment.]

I am trying to train a recognition model on top of an existing model. GT is in paired pageXML and image files

After initiating ketos train with the following parameters: -f xml (same result with -f page)

I get the error:

FileNotFoundError: [Errno 2] No such file or directory: 
'../kraken/recognition_models/sinai_no_voc_61.gt.txt'
# where sinai_no_voc_61 is the model name to retrieve

But if I understand what is supposed to happen, kraken is not supposed to look for training data in in *.gt.txt but in the documents specified with the -e and -t flags, which in this case are in entirely different directories. Am I missing something very basic here?

mittagessen commented 1 year ago

On 23/11/05 02:56PM, Hayim Lapin wrote:

[In case it matters, I am running kraken in a colab environment.]

I am trying to train a recognition model on top of an existing model. GT is in paired pageXML and image files

After initiating ketos train with the following parameters: -f xml (same result with -f page)

I get the error:

FileNotFoundError: [Errno 2] No such file or directory: 
'../kraken/recognition_models/sinai_no_voc_61.gt.txt'
# where sinai_no_voc_61 is the model name to retrieve

But if I understand what is supposed to happen, kraken is not supposed to look for training data in in *.gt.txt but in the documents specified with the -e and -t flags, which in this case are in entirely different directories. Am I missing something very basic here?

Can you show me the whole command you're running? I guess you didn't put the -f argument in the right place, otherwise it wouldn't search for a *.gt.txt file.

hlapin commented 1 year ago
!ketos train \
-o trained_recognition_models/{project_name}_{trial} \
-t recognition_training_xml/{project_name}_xml_train.txt \
-e recognition_training_xml/{project_name}_xml_test.txt \
-q early \
-- verbose \
--normalize-whitespace \
--reorder \
-f xml \
-d cuda:0 \
--resize add \
-i {path_to_model}/{recog_model} \
-r 0.0001 \
-B 1 # batchsize

Moving up -f generates

Error: No training data was provided to the train command. Use `-t` or the `ground_truth` argument.
hlapin commented 1 year ago

Sorry, stupid mistake. Moving up -f results in:

[11/06/23 15:00:18] WARNING  Parsing recognition_training_xml/maim_autogr_xml_train.txt 

And then quits

mittagessen commented 1 year ago

On 23/11/06 07:01AM, Hayim Lapin wrote:

!ketos train \
-o trained_recognition_models/{project_name}_{trial} \
-t recognition_training_xml/{project_name}_xml_train.txt \
-e recognition_training_xml/{project_name}_xml_test.txt \
-q early \
-- verbose \
--normalize-whitespace \
--reorder \
-f xml \
-d cuda:0 \
--resize add \
-i {path_to_model}/{recog_model} \
-r 0.0001 \
-B 1 # batchsize

The issue is the space in -- verbose. -- individually is a shell expression stopping argument parsing. So everything after it gets ignored or rather used as input files.

hlapin commented 1 year ago
Error: No such option: -v

or

Error: No such option: --verbose

without -v, --verbose: [11/06/23 15:53:53] WARNING Could not open file

mittagessen commented 1 year ago

On 23/11/06 08:43AM, Hayim Lapin wrote:

Error: No such option: -v

or

Error: No such option: --verbose

Yes, sorry the verbose option is on the base command ketos as it exists on all subcommands. So:

ketos -v train ....

would be correct.

without -v, --verbose: [11/06/23 15:53:53] WARNING Could not open file

That is just a warning. Probably an empty line in your manifest file. It should just skip anything that isn't loadable.

hlapin commented 1 year ago

You can close this issue (file paths and *.gt.txt), but kraken still does not run. [11/06/23 19:51:08] INFO Loading existing model from and then quits after 21 seconds or so

mittagessen commented 1 year ago

On 23/11/06 11:59AM, Hayim Lapin wrote:

You can close this issue (file paths and *.gt.txt), but kraken still does not load. [11/06/23 19:51:08] INFO Loading existing model from and then quits after 21 seconds or so

Hm weird. You can add multiple -v switches to increase the verbosity.

My most immediate suspicion is that the manifest files still don't point to the right files. The paths in there need to be either absolute or relative to the current location and not the location of the file itself.

hlapin commented 1 year ago

In case anyone finds this issue, reporting that the problem was all on my side, setting up the training data directory.