mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
708 stars 129 forks source link

Fine-tuned segmentation model fails to determine regions #621

Closed fattynoparents closed 1 month ago

fattynoparents commented 1 month ago

I have fine-tuned a segmentation model so that it could detect custom regions. I have successfully used the Segment command in eScriptorium with the new model and it found the regions quite well. I now need to use the kraken command-line commands to find baselines and regions in many files, so I tried the following as a test:

 kraken -x -i 1.jpg 1.xml segment -bl -i ~/kraken-test/output_49.mlmodel 

I am getting the following messages and then a PageXML file is created:

Loading ANN /home/user/kraken-test/output_49.mlmodel ✓ 
Segmenting      ✓   

However, in the resulting PageXML file there are no region types, all text regions look like this:

<TextRegion id="region-id" custom="structure {type:;}"> 

How can I run the segmentation command including the detection of the regions?

mittagessen commented 1 month ago

I screwed up porting the PageXML template to kraken 5 and the tests didn't catch it because region typologie is in a free text field. I've pushed a fix.