Closed bertsky closed 3 months ago
Yes, one_channel_mode
is used to distinguish between grayscale and b/w inputs because this information can't be infered from the VGSL input spec alone. For an RGB model its value is undefined.
What I am saying is: if you did have RGB
samples, then the model should at least be advertised as L
, not 1
.
What I mean is one should looks at the VGSL input spec of the model to see if it is a model with one channel inputs and then check what kind of one channel inputs it expects. Looking at a field called one_channel_mode
for a three channel input doesn't really make sense, no?
Let me put it this way: As a user, I have no way of knowing that, there's only one_channel_mode
in the metadata. And for your default --spec
… https://github.com/mittagessen/kraken/blob/b48145dad2de326af0c1d3c837d0f823d2331412/kraken/lib/default_specs.py#L20 … which uses 1 for channels
, i.e. binary or grayscale, IIUC any RGB images will be converted to grayscale. Thus, keeping one_channel_mode
at 1 is wrong if there were in fact RGB samples.
… which uses 1 for channels, i.e. binary or grayscale, IIUC any RGB images will be converted to grayscale. Thus, keeping one_channel_mode at 1 is wrong if there were in fact RGB samples.
It will be set to L
because the im_mode
on the dataset object is set to the highest image mode after applying the image transforms (which would produce grayscale images in this case). Anything else is a bug.
The canonical way to determine what kind of input a model expects is to use the VGSL input spec. The additional one_channel_mode
attribute is a necessary kludge to be able to disambiguate inputs that are functionally different but map to the same VGSL input block, namely grayscale and binarized images. It doesn't make sense to disambiguate inputs that are already unambiguous.
It will be set to
L
because theim_mode
on the dataset object is set to the highest image mode after applying the image transforms (which would produce grayscale images in this case). Anything else is a bug.
Ah, now I can see it, thanks for clarifying.
But then why do @stweil's Kraken models mostly have mode==1
, despite being trained on grayscale?
The canonical way to determine what kind of input a model expects is to use the VGSL input spec. The additional
one_channel_mode
attribute is a kludge necessary because VGSL doesn't have a way to distinguish between grayscale and binary inputs (which makes a difference in recognition quality, unfortunately).
I understand, but
But then why do @stweil's Kraken models https://github.com/OCR-D/ocrd_kraken/pull/38#issuecomment-1607471614, despite being trained on grayscale?
I don't know when he trained those models but in the past there was a bug where the mode wouldn't be upgraded because of the dataset running in separate processes. I think I had fixed that a while ago but I can check again.
then the VGSL spec itself (or just the channels parameter) should also be added to the model metadata IMHO
It is available in the input
attribute of the model as a tuple (batch, channels, height, width)
.
I don't know when he trained those models but in the past there was a bug where the mode wouldn't be upgraded because of the dataset running in separate processes.
I wouldn't know. He did use eScriptorium AFAIK.
It is available in the
input
attribute of the model as a tuple(batch, channels, height, width)
.
Oh, that does help! Should be documented IMO.
I wouldn't know. He did use eScriptorium AFAIK.
~Most~ All model trainings were done on the command line with the latest kraken code (at the time of the training start) from GitHub. It is not possible to train models with a given network specification from several GT sources in eScriptorium.
~Only the typewriter model was trained in eScriptorium.~
Oh, that does help! Should be documented IMO.
I can do that. Most of that stuff is stable but hasn't been documented as the correct image transforms get run automatically inside the respective methods.
I can do that. Most of that stuff is stable but hasn't been documented as the correct image transforms get run automatically inside the respective methods.
@mittagessen It's your decision. For me it took some time to dig through your rpred and blla before I understood how to use the API for OCR-D (esp. around multi-model / tag mechanism and segmentation formats). So the documentation would need more than just that detail improved. But as long as it does not change, I am fine.
@stweil
Do you still have some logs? Perhaps we could pinpoint the version, or at least see the Upgrading
messages... I would like to fix the model metadata, and from my testing, I am confident they benefit from grayscale, but want to be absolutely sure.
So the documentation would need more than just that detail improved.
There's a large refactoring happening right now that replaces all the opaque dicts with proper container classes which should clear up a lot of the implicitness in the arguments (and gives me an opportunity to have to rewrite all of the docs).
Apparently, I hadn't corrected the image mode promotion after all. There's a fix in fix/8bit_mode_dataloader
which I need to verify and merge.
Apparently, I hadn't corrected the image mode promotion after all. There's a fix in
fix/8bit_mode_dataloader
which I need to verify and merge.
I don't understand. So the documentation issue (which IIUC it turned out to be) is fixed, hence you closed, but then there's also https://github.com/mittagessen/kraken/compare/main...fix/8bit_mode_dataloader, which is what exactly – a fix to the original problem, or to some new problem (perhaps a race when using multiprocessing)?
Also, more broadly, regarding your refactoring, since you have not released anything yet, could you please give me an early warning as to any API changes I need ocrd_kraken to adapt towards?
which is what exactly – a fix to the original problem, or to some new problem (perhaps a race when using multiprocessing)?
The im_mode
variable wasn't shared between processes so it only got promoted inside the DataLoader
worker processes and not the kraken main process that would actually set the flag in the model metadata.
Also, more broadly, regarding your refactoring, since you have not released anything yet
5.0 has been released on pypi but I haven't done the github release yet as the conda build broke for some unfathomable reason. The github release will include an upgrade guide to the new container classes.
In OCR-D, we rely on the model metadata correctly specifying whether images should be binarized or not – via
one_channel_mode
property.However, there seems to be a gap in the chain of feature promotion:
1
by defaultim_mode
of the whole dataset if it encounters an input sample with "more" channelsHowever, in (1), the (maximal) case of
RGB
does not get promoted.