Closed mirkh closed 2 months ago
I've pushed a small change to print it per default now. In general I'd suggest updating to the latest version. It should be quite a bit faster for segmentation.
In addition, batch processing like this is much faster when run with a tool like parallel
. Segmenting a million pages serially is going to take ages while with parallelization you can just throw whatever resources you got at it (and failures on single pages don't take down everything else). You incur the additional overhead of having to load the model for each process but this is usually acceptable.
Thank you very much! Both for working on kraken, for the tips, and for the change you made!
Hello,
I'm segmenting almost a million images (in batches) with a model we trained. It takes a long time, and then it is important to find out if and where it fails or warns.
At the moment I'm running kraken version 4.2.0.
I use this command to run segmentation of png-files in a folder, creating alto xml files:
kraken -d cuda:0 -I '*.png' -o .xml --alto segment -bl -i model.mlmodel
Is there any option to add to make the output print not just
Segmenting ✓
but also what file it is currently segmenting?
Thanks! / Maria