Open bertsky opened 1 year ago
As far as I understand (and please @vahidrezanezhad correct me), Eynollah will almost always produce a better result from a grayscale or color image than from a binarized image.
However, if the input image is "strongly dark or bright" (and this needs a bit more explanation), the user may try to get a better result by setting "input_binary" to true. In this case, Eynollah itself will binarize the image and the user does not have to worry about having to binarize the image with another tool. (Note: I would like to fully integrate sbb_binarization for this)
I find that second sentence very confusing (esp. around otherwise).
Agreed, we will try and reformulate this for better clarity.
What steps of the pipeline are affected?
@vahidrezanezhad should be able to answer this.
it looks like binarization is repeated multiple times, without re-using the previous result
This we will also check wrt to performance.
Can anything be said about how pretrained models would fare when passed (externally) binarized images?
The only thing I can say is that it would be an interesting experiment to evaluate this :) But I am afraid it will require a lot of effort to do this properly (per step, with different binarization methods/models and good metrics for OCR and layout) and only be relevant for few images with bad quality.
Ok, then (besides reformulation of the description) I highly recommend renaming that option, e.g. apply_binarization
: after all, it's not the input that must/can be binary, but the internal step that is performed.
Integrating sbb_binarization / experimenting with external tools: the OCR-D way would be to just use whatever derived images with binarized
in @comments
can be found, i.e. whatever binarization has been on the workflow. So whether it is sbb_binarization or any other tool – it would be up to the user to decide and experiment. (But if the internal binarizer here is different than sbb_binarize and perhaps better, then it gets more complicated...)
Let me first confirm the above and then we can rename the option, ideally also consistent for scaling, enhancing, resizing.
As far as I understand (and please @vahidrezanezhad correct me), Eynollah will almost always produce a better result from a grayscale or color image than from a binarized image.
This is exactly the case. Our best performance can be met from a grayscale or color image.
(Also, implementation-wise, it looks like binarization is repeated multiple times, without re-using the previous result...)
I will check it. By the way it should not be implemented multiple times.
Integrating sbb_binarization / experimenting with external tools: the OCR-D way would be to just use whatever derived images with
binarized
in@comments
can be found, i.e. whatever binarization has been on the workflow. So whether it is sbb_binarization or any other tool – it would be up to the user to decide and experiment. (But if the internal binarizer here is different than sbb_binarize and perhaps better, then it gets more complicated...)
The internal binarizer uses the same models as sbb_binarization.
The only documentation for this kwarg is in the standalone CLI:
I find that second sentence very confusing (esp. around
otherwise
).So this means that binarization is attempted internally (when activated)? What steps of the pipeline are affected?
(Also, implementation-wise, it looks like binarization is repeated multiple times, without re-using the previous result...)
Can anything be said about how pretrained models would fare when passed (externally) binarized images?