mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
750 stars 131 forks source link

Exception when forcing binarization in segtrain command #634

Closed fattynoparents closed 2 months ago

fattynoparents commented 3 months ago

When trying to use --force-binarization in segtrain command i get the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /kraken/bin/ketos:8 in <module>                                      │
│                                                                                                  │
│   5 from kraken.ketos import cli                                                                 │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(cli())                                                                          │
│   9                                                                                              │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/click/core.py:1157 in __call__  │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/click/core.py:1078 in main      │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/click/core.py:1688 in invoke    │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/click/core.py:1434 in invoke    │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/click/core.py:783 in invoke     │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/click/decorators.py:33 in       │
│ new_func                                                                                         │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/kraken/ketos/segmentation.py:32 │
│ 3 in segtrain                                                                                    │
│                                                                                                  │
│   320 │   else:                                                                                  │
│   321 │   │   val_check_interval = {'val_check_interval': hyper_params['freq']}                  │
│   322 │                                                                                          │
│ ❱ 323 │   model = SegmentationModel(hyper_params,                                                │
│   324 │   │   │   │   │   │   │     output=output,                                               │
│   325 │   │   │   │   │   │   │     spec=spec,                                                   │
│   326 │   │   │   │   │   │   │     model=load,                                                  │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/kraken/lib/train.py:813 in      │
│ __init__                                                                                         │
│                                                                                                  │
│    810 │   │   if not training_data:                                                             │
│    811 │   │   │   raise ValueError('No training data provided. Please add some.')               │
│    812 │   │                                                                                     │
│ ❱  813 │   │   transforms = ImageInputTransforms(batch,                                          │
│    814 │   │   │   │   │   │   │   │   │   │     height,                                         │
│    815 │   │   │   │   │   │   │   │   │   │     width,                                          │
│    816 │   │   │   │   │   │   │   │   │   │     channels,                                       │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/kraken/lib/dataset/utils.py:73  │
│ in __init__                                                                                      │
│                                                                                                  │
│    70 │   │   self._force_binarization = force_binarization                                      │
│    71 │   │   self._batch = batch                                                                │
│    72 │   │   self._channels = channels                                                          │
│ ❱  73 │   │   self.pad = pad                                                                     │
│    74 │   │                                                                                      │
│    75 │   │   self._create_transforms()                                                          │
│    76                                                                                            │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/kraken/lib/dataset/utils.py:222 │
│ in pad                                                                                           │
│                                                                                                  │
│   219 │   │   if not isinstance(pad, (numbers.Number, tuple, list)):                             │
│   220 │   │   │   raise TypeError('Got inappropriate padding arg')                               │
│   221 │   │   self._pad = pad                                                                    │
│ ❱ 222 │   │   self._create_transforms()                                                          │
│   223 │                                                                                          │
│   224 │   @property                                                                              │
│   225 │   def valid_norm(self) -> bool:                                                          │
│                                                                                                  │
│ /kraken/lib/python3.11/site-packages/kraken/lib/dataset/utils.py:106 │
│ in _create_transforms                                                                            │
│                                                                                                  │
│   103 │   │   │   raise KrakenInputException(f'Invalid input spec {self._batch}, {height}, {wi   │
│   104 │   │                                                                                      │
│   105 │   │   if self._mode != 'L' and self._force_binarization:                                 │
│ ❱ 106 │   │   │   raise KrakenInputException(f'Invalid input spec {self._batch}, {height}, {wi   │
│   107 │   │   │   │   │   │   │   │   │      'combination with forced binarization.')            │
│   108 │   │                                                                                      │
│   109 │   │   self.transforms = []                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KrakenInputException: Invalid input spec 1, 1800, 0, 3, (0, 0) in combination with forced binarization.

The command is:

ketos segtrain -d cuda:0 -f xml -t output.txt --resize both --schedule cosine -i /path/to/model -o output/model -q early --min-epochs 50 -N 70 --suppress-regions --line-width 10 --workers 48 --augment --force-binarization
mittagessen commented 3 months ago

This is intended. You're using the default model architecture with the input [1,1800,0,3.... which uses 3 input channels (RGB) so forced binarization doesn't really make sense. The switch is intended to be used with single channel inputs, e.g. [1,1800,0,1... where the training data might be a mixture of grayscale and B/W images.

fattynoparents commented 3 months ago

Ok thanks for the info!

fattynoparents commented 3 months ago

The switch is intended to be used with single channel inputs, e.g. [1,1800,0,1... where the training data might be a mixture of grayscale and B/W images

Do you think it could be worth converting the scans to BW and use the binarization parameter? Can it improve the training essentially? The images I use are not colored scans so I can convert them to BW without quality loss.

fattynoparents commented 3 months ago

Actually I have now converted my images to grayscale and trying this command I still get the same error:

ketos segtrain -d cuda:0 -f xml -t output_bw.txt --resize both --schedule cosine -i /path/to/model -o output/model -q early --min-epochs 50 -N 70 --suppress-regions --line-width 10 --workers 48 --augment --force-binarization -s "[1,1800,0,1 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32 Cr3,3,256 Gn32 Lbx32 Lby32 Cr1,1,32 Gn32 Lby32 Lbx32]"
mittagessen commented 3 months ago

On 24/08/13 07:26AM, fattynoparents wrote:

Actually I have now converted my images to grayscale and trying this command I still get the same error:

ketos segtrain -d cuda:0 -f xml -t output_bw.txt --resize both --schedule cosine -i /path/to/model -o output/model -q early --min-epochs 50 -N 70 --suppress-regions --line-width 10 --workers 48 --augment --force-binarization -s "[1,1800,0,1 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32 Cr3,3,256 Gn32 Lbx32 Lby32 Cr1,1,32 Gn32 Lby32 Lbx32]"

This is weird because a) the exception is triggered only in the case where the input spec is multi-channel and b) it works for me with this simple example:

ketos segtrain -f xml -s "[1,600,0,1 Cr7,7,64,2,2]" --force-binarization *.xml

No need to convert the images manually to grayscale by the way, it is sufficient to adjust the input spec.

fattynoparents commented 3 months ago

No need to convert the images manually to grayscale by the way, it is sufficient to adjust the input spec.

Thanks for the tips!

I have now tried to simplify my command to see which part can cause this, and it appeared that it's the -i parameter. As soon as I remove the path to the existing model that I use as a base for training, the command starts running. Otherwise, I get the Invalid input spec 1, 1800, 0, 3, (0, 0) in combination with forced binarization. exception.

fattynoparents commented 3 months ago

Will there be a fix for this? I'm just trying all possible ways to improve my training score, so I thought that forcing binarization might also help..

mittagessen commented 3 months ago

Ah yes, sorry. You can't change the input spec of an existing model as it changes the layer shapes and there are very limited circumstances where you can get away with that. So if you want to fine-tune with B/W data you need to start off from a model that's has 1 channel inputs (or you can just binarize your input data manually with kraken binarize which should roughly be equivalent even when not touching the input spec).

In general though binarization should be considered harmful. It boosts accuracy for basic and clean scans slightly but degrades catastrophically in most cases.

fattynoparents commented 3 months ago

Oh I see, thanks a lot for the explanation.

Do you possibly know other ways to improve the fine-tuning of a segmentation model?

So far I have tried various line widths, two types of schedule (cosine and reduceonplateau) and manually splitting the train and validation images.

I still don't get more than 0.47 val_mean_iu (I have noticed that the higher val_mean_ui corresponds well to the accuracy of my model in eScriptorium on unseen data).

mittagessen commented 3 months ago

On 24/08/19 10:49AM, fattynoparents wrote:

Do you possibly know other ways to improve the fine-tuning of a segmentation model?

It can be a bit finicky to get the best results and hyperparameter choice doesn't seem to affect it much. It is possible that your ontology is too complex so you can try merging for example some line classes to see if that will improve results.

I still don't get more than 0.47 val_mean_iu (I have noticed that the higher val_mean_ui corresponds well to the accuracy of my model in eScriptorium on unseen data).

In general the metrics aren't particularly meaningful. Better values do not always correspond to better segmentation results as the line pixel maps go through post-processing so the link between pixel map IoU and area-less polybaselines is rather tenuous. Obviously for region-only segmentation models this doesn't apply.

fattynoparents commented 3 months ago

It can be a bit finicky to get the best results and hyperparameter choice doesn't seem to affect it much. It is possible that your ontology is too complex so you can try merging for example some line classes to see if that will improve results.

I don't have line classes at all, so there's nothing to merge unfortunately.

My main problem is that the model often doesn't cut the baselines at the margins, even when there's a clear border at which it should do it. So for example in this case the model would most probably draw a continous line like in the pic below, while I need a break at the margin: image

mittagessen commented 2 months ago

Ah, in that case you might try to either increase the resolution of the network input (bumping up the 1800 in the definition to something higher) or create different line classes for the marginal text. Those are in different output maps so they will never be merged.

fattynoparents commented 2 months ago

Thank you for the suggestion to set different classes for the marginal text, it seems to have improved my model sufficiently.