Closed mabarber92 closed 7 months ago
When running the command for a set of manuscripts
ketos train --device cuda:0 --output p11-scratch --normalization NFD --normalize-whitespace --format-type alto sbzb_glaser_33.pdf_page_11.xml
The log is as follows:
David pulled the latest version of Kraken from the main branch but does not get the training loss. Can you suggest the correct branch to use?
When running equivalent training for print
ketos train --device cuda:0 --load ~j.murel/ArabicTestOutput/print_transcription_NEW.mlmodel --output p11 --normalization NFD --normalize-whitespace --resize add --format-type alto sbzb_glaser_33.pdf_page_11.xml
This is the log:
@mittagessen @dasmiq Would it make sense to figure this out within this issue?
This was the page I was trying to train on: https://github.com/OpenITI/arabic_ms_data/blob/main/firuzabadi_al_qamus_al_muhit/sbzb_glaser_33/sbzb_glaser_33.pdf_page_11.xml (I've seen the same issue of zero accuracy with larger training runs, but hopefully this is sufficient to test.)
@mabarber92 To correct what you said, my second training run was on the same manuscript page, but I initialized the model from a print model instead of starting from scratch.
David pulled the latest version of Kraken from the main branch but does not get the training loss. Can you suggest the correct branch to use?
You can monitor the training using Tensorboard. You just need to add the --logger tensorboard
and --log-dir ./
arguments to your command.
@mabarber92 To correct what you said, my second training run was on the same manuscript page, but I initialized the model from a print model instead of starting from scratch.
Have you tried to use a smaller LR, i.e. 0.0001?
Yes, I've used smaller learning rates in many experiments. But I sent the results using the default parameters for simplicity.
I usually set -B 1 -r 0.0001 -w 0 for greater batchsize increase -r by squareroot of -B
a single page is not enough GT to converge however.
Here is the log training at --lrate 0.0001 p11-scratch-le4.log
Thank you @dstoekl I know a single page is small, but this issue with training from scratch persists at larger amounts of training data. Could you suggest an amount of data and training parameters you would like us to run?
50 pp? increase -lag to 20?
You should first use Tensorboard to check if your training loss is decreasing.
@dasmiq The training loss probably gets overwritten by the pytorch-lightning progress bar when the validation loss is available. Tensorboard logging is probably the quickest way to check it though.
As noted in today's meeting the model trained from scratch on Arabic manuscripts fails to recognise any text. I have tried running it on binarized and un-binarized. The output in eScriptorium is like this:
The model being used is this: ms_scratch.zip
I believe the model may have an error, as it doesn't have an accuracy rate. However, eScriptorium does not error (and I'm assuming that Kraken isn't erroring in the backend either, as it produces output).