I applied the new baseline models I have trained for ATR and segmentation on a bunch of png files to quickly get the transcription in plain text. But the line order got jumbled up in the transcription.
The scans have fewer lines on every page (poetry), so the line order got messed up mostly after line 7 or 8, so in the lower half of the page.
I can always get the transcription in eScriptiorium, which I am presently doing, and everything is working perfectly fine.
But I wanted to quickly get the transcription from cli with kraken, so I tried.
I never faced this issue earlier with kraken's default segmentation, even on multi-column layout with illustrations, where the default segmentation kept failing because of the complex layout.
I wanted to check if there's a fix in place. Or am I doing something wrong?
My command is this:
for i in *.png; do kraken -i $i ${i%.png}.txt segment -bl -i 7bnverse_21.mlmodel ocr -m 24bnATR_best.mlmodel; done
Hi @mittagessen,
This might be related to #212
I applied the new baseline models I have trained for ATR and segmentation on a bunch of png files to quickly get the transcription in plain text. But the line order got jumbled up in the transcription. The scans have fewer lines on every page (poetry), so the line order got messed up mostly after line 7 or 8, so in the lower half of the page.
I can always get the transcription in eScriptiorium, which I am presently doing, and everything is working perfectly fine. But I wanted to quickly get the transcription from cli with kraken, so I tried.
I never faced this issue earlier with kraken's default segmentation, even on multi-column layout with illustrations, where the default segmentation kept failing because of the complex layout.
I wanted to check if there's a fix in place. Or am I doing something wrong?
My command is this:
for i in *.png; do kraken -i $i ${i%.png}.txt segment -bl -i 7bnverse_21.mlmodel ocr -m 24bnATR_best.mlmodel; done
Thank you for your hard work.