mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
750 stars 131 forks source link

kraken 5.2.9 segmentation training no lines #651

Open johnlockejrr opened 1 month ago

johnlockejrr commented 1 month ago

I try to train a segmentation model with page-xml data, at start the segmenter shows me the regions and line types but when using the model no lines is detected at all!

(kraken-5.2.9) incognito@DESKTOP-NHKR7QL:~/kraken-train/102_Petrov_isbach$ ketos segtrain -d cuda:0 -f page -t output.txt -q early -cl --min-epochs 40 -o /home/incognito/kraken-train/102_Petrov_isbach/seg_v2/isbach_seg_v2
Training line types:
  textline      2       1034
Training region types:
  textzone      3       19
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA GeForce RTX 4070') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name              ┃ Type                     ┃ Params ┃                      In sizes ┃                Out sizes ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0  │ net               │ MultiParamSequential     │  1.3 M │             [1, 3, 1800, 300] │   [[1, 4, 450, 75], '?'] │
│ 1  │ net.C_0           │ ActConv2D                │  9.5 K │      [[1, 3, 1800, 300], '?'] │ [[1, 64, 900, 150], '?'] │
│ 2  │ net.Gn_1          │ GroupNorm                │    128 │ [[1, 64, 900, 150], '?', '?'] │ [[1, 64, 900, 150], '?'] │
│ 3  │ net.C_2           │ ActConv2D                │ 73.9 K │ [[1, 64, 900, 150], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 4  │ net.Gn_3          │ GroupNorm                │    256 │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 5  │ net.C_4           │ ActConv2D                │  147 K │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 6  │ net.Gn_5          │ GroupNorm                │    256 │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 7  │ net.C_6           │ ActConv2D                │  295 K │ [[1, 128, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 8  │ net.Gn_7          │ GroupNorm                │    512 │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 9  │ net.C_8           │ ActConv2D                │  590 K │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 10 │ net.Gn_9          │ GroupNorm                │    512 │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 11 │ net.L_10          │ TransposedSummarizingRNN │ 74.2 K │ [[1, 256, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 12 │ net.L_11          │ TransposedSummarizingRNN │ 25.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 13 │ net.C_12          │ ActConv2D                │  2.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 32, 450, 75], '?'] │
│ 14 │ net.Gn_13         │ GroupNorm                │     64 │  [[1, 32, 450, 75], '?', '?'] │  [[1, 32, 450, 75], '?'] │
│ 15 │ net.L_14          │ TransposedSummarizingRNN │ 16.9 K │  [[1, 32, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 16 │ net.L_15          │ TransposedSummarizingRNN │ 25.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 17 │ net.l_16          │ ActConv2D                │    260 │  [[1, 64, 450, 75], '?', '?'] │   [[1, 4, 450, 75], '?'] │
│ 18 │ val_px_accuracy   │ MultilabelAccuracy       │      0 │                             ? │                        ? │
│ 19 │ val_mean_accuracy │ MultilabelAccuracy       │      0 │                             ? │                        ? │
│ 20 │ val_mean_iu       │ MultilabelJaccardIndex   │      0 │                             ? │                        ? │
│ 21 │ val_freq_iu       │ MultilabelJaccardIndex   │      0 │                             ? │                        ? │
└────┴───────────────────┴──────────────────────────┴────────┴───────────────────────────────┴──────────────────────────┘
Trainable params: 1.3 M
Non-trainable params: 0
Total params: 1.3 M
Total estimated model params size (MB): 5
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:09 • 0:00:00 2.00it/s val_accuracy: 0.878 val_mean_acc: 0.878 val_mean_iu: 0.124 val_freq_iu: 0.415 early_stopping: 0/10 0.12395
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.927 val_mean_acc: 0.927 val_mean_iu: 0.144 val_freq_iu: 0.482 early_stopping: 0/10 0.14409
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.91it/s val_accuracy: 0.945 val_mean_acc: 0.945 val_mean_iu: 0.161 val_freq_iu: 0.539 early_stopping: 0/10 0.16118
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.95it/s val_accuracy: 0.975 val_mean_acc: 0.975 val_mean_iu: 0.218 val_freq_iu: 0.728 early_stopping: 0/10 0.21762
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.221 val_freq_iu: 0.739 early_stopping: 0/10 0.22080
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:09 • 0:00:00 1.90it/s val_accuracy: 0.975 val_mean_acc: 0.975 val_mean_iu: 0.217 val_freq_iu: 0.725 early_stopping: 1/10 0.22080
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.222 val_freq_iu: 0.743 early_stopping: 0/10 0.22191
stage 7/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.220 val_freq_iu: 0.737 early_stopping: 1/10 0.22191
stage 8/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.222 val_freq_iu: 0.744 early_stopping: 0/10 0.22241
stage 9/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:09 • 0:00:00 1.90it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.221 val_freq_iu: 0.740 early_stopping: 1/10 0.22241
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:09 • 0:00:00 1.93it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.220 val_freq_iu: 0.735 early_stopping: 2/10 0.22241
stage 11/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.752 early_stopping: 0/10 0.22481
stage 12/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.92it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.752 early_stopping: 1/10 0.22481
stage 13/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.88it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.754 early_stopping: 0/10 0.22528
stage 14/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.95it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.753 early_stopping: 1/10 0.22528
stage 15/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.89it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.222 val_freq_iu: 0.742 early_stopping: 2/10 0.22528
stage 16/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:09 • 0:00:00 1.91it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.752 early_stopping: 3/10 0.22528
stage 17/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.90it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.753 early_stopping: 4/10 0.22528
stage 18/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.753 early_stopping: 5/10 0.22528
stage 19/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.220 val_freq_iu: 0.737 early_stopping: 6/10 0.22528
stage 20/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.974 val_mean_acc: 0.974 val_mean_iu: 0.214 val_freq_iu: 0.716 early_stopping: 7/10 0.22528
stage 21/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.95it/s val_accuracy: 0.975 val_mean_acc: 0.975 val_mean_iu: 0.218 val_freq_iu: 0.731 early_stopping: 8/10 0.22528
stage 22/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.978 val_mean_acc: 0.978 val_mean_iu: 0.227 val_freq_iu: 0.760 early_stopping: 0/10 0.22707
stage 23/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.95it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.752 early_stopping: 1/10 0.22707
stage 24/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.978 val_mean_acc: 0.978 val_mean_iu: 0.227 val_freq_iu: 0.761 early_stopping: 0/10 0.22729
stage 25/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.979 val_mean_acc: 0.979 val_mean_iu: 0.228 val_freq_iu: 0.765 early_stopping: 0/10 0.22842
stage 26/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:09 • 0:00:00 1.90it/s val_accuracy: 0.978 val_mean_acc: 0.978 val_mean_iu: 0.226 val_freq_iu: 0.757 early_stopping: 1/10 0.22842
stage 27/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.92it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.753 early_stopping: 2/10 0.22842
stage 28/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.975 val_mean_acc: 0.975 val_mean_iu: 0.220 val_freq_iu: 0.735 early_stopping: 3/10 0.22842
stage 29/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.755 early_stopping: 4/10 0.22842
stage 30/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.752 early_stopping: 5/10 0.22842
stage 31/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.222 val_freq_iu: 0.744 early_stopping: 6/10 0.22842
stage 32/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.975 val_mean_acc: 0.975 val_mean_iu: 0.220 val_freq_iu: 0.736 early_stopping: 7/10 0.22842
stage 33/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.221 val_freq_iu: 0.738 early_stopping: 8/10 0.22842
stage 34/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.224 val_freq_iu: 0.750 early_stopping: 9/10 0.22842
stage 35/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.94it/s val_accuracy: 0.950 val_mean_acc: 0.950 val_mean_iu: 0.169 val_freq_iu: 0.565 early_stopping: 10/10 0.22842
stage 36/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/17 0:00:00 • -:--:-- 0.00it/s  early_stopping: 10/10 0.22842Trainer was signaled to stop but the required `min_epochs=40` or `min_steps=None` has not been met. Training will continue...
stage 36/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.93it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.223 val_freq_iu: 0.745 early_stopping: 11/10 0.22842
stage 37/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.92it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.225 val_freq_iu: 0.753 early_stopping: 12/10 0.22842
stage 38/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:09 • 0:00:00 1.89it/s val_accuracy: 0.977 val_mean_acc: 0.977 val_mean_iu: 0.223 val_freq_iu: 0.747 early_stopping: 13/10 0.22842
stage 39/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:08 • 0:00:00 1.95it/s val_accuracy: 0.976 val_mean_acc: 0.976 val_mean_iu: 0.223 val_freq_iu: 0.746 early_stopping: 14/10 0.22842
Moving best model /home/incognito/kraken-train/102_Petrov_isbach/seg_v2/isbach_seg_v2_25.mlmodel (0.22842147946357727) to /home/incognito/kraken-train/102_Petrov_isbach/seg_v2/isbach_seg_v2_best.mlmodel

Result:

image

johnlockejrr commented 1 month ago

UPDATE:

I tried training only for lines but seems they are not trained at all!

(kraken-5.2.9) incognito@DESKTOP-NHKR7QL:~/kraken-train/102_Petrov_isbach$ ketos segtrain -d cuda:0 -f page -t output.txt -q early -cl --min-epochs 40 --suppress-regions -o /home/incognito/kraken-train/102_Petrov_isbach/seg_v2/isbach_seg_v2
Training line types:
  textline      2       1034
Training region types:
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA GeForce RTX 4070') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name              ┃ Type                     ┃ Params ┃                      In sizes ┃                Out sizes ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0  │ net               │ MultiParamSequential     │  1.3 M │             [1, 3, 1800, 300] │   [[1, 3, 450, 75], '?'] │
│ 1  │ net.C_0           │ ActConv2D                │  9.5 K │      [[1, 3, 1800, 300], '?'] │ [[1, 64, 900, 150], '?'] │
│ 2  │ net.Gn_1          │ GroupNorm                │    128 │ [[1, 64, 900, 150], '?', '?'] │ [[1, 64, 900, 150], '?'] │
│ 3  │ net.C_2           │ ActConv2D                │ 73.9 K │ [[1, 64, 900, 150], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 4  │ net.Gn_3          │ GroupNorm                │    256 │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 5  │ net.C_4           │ ActConv2D                │  147 K │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 6  │ net.Gn_5          │ GroupNorm                │    256 │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 7  │ net.C_6           │ ActConv2D                │  295 K │ [[1, 128, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 8  │ net.Gn_7          │ GroupNorm                │    512 │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 9  │ net.C_8           │ ActConv2D                │  590 K │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 10 │ net.Gn_9          │ GroupNorm                │    512 │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 11 │ net.L_10          │ TransposedSummarizingRNN │ 74.2 K │ [[1, 256, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 12 │ net.L_11          │ TransposedSummarizingRNN │ 25.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 13 │ net.C_12          │ ActConv2D                │  2.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 32, 450, 75], '?'] │
│ 14 │ net.Gn_13         │ GroupNorm                │     64 │  [[1, 32, 450, 75], '?', '?'] │  [[1, 32, 450, 75], '?'] │
│ 15 │ net.L_14          │ TransposedSummarizingRNN │ 16.9 K │  [[1, 32, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 16 │ net.L_15          │ TransposedSummarizingRNN │ 25.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 17 │ net.l_16          │ ActConv2D                │    195 │  [[1, 64, 450, 75], '?', '?'] │   [[1, 3, 450, 75], '?'] │
│ 18 │ val_px_accuracy   │ MultilabelAccuracy       │      0 │                             ? │                        ? │
│ 19 │ val_mean_accuracy │ MultilabelAccuracy       │      0 │                             ? │                        ? │
│ 20 │ val_mean_iu       │ MultilabelJaccardIndex   │      0 │                             ? │                        ? │
│ 21 │ val_freq_iu       │ MultilabelJaccardIndex   │      0 │                             ? │                        ? │
└────┴───────────────────┴──────────────────────────┴────────┴───────────────────────────────┴──────────────────────────┘
Trainable params: 1.3 M
Non-trainable params: 0
Total params: 1.3 M
Total estimated model params size (MB): 5
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.27it/s val_accuracy: 0.978 val_mean_acc: 0.978 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 0/10 0.00000
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.03it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 1/10 0.00000
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.07it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 2/10 0.00000
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.11it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 3/10 0.00000
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.01it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 4/10 0.00000
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.03it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 5/10 0.00000
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/17 0:00:00 • -:--:-- 0.00it/s  early_stopping: 5/10 0.00000
[10/15/24 16:07:50] WARNING  Model did not improve during training.
johnlockejrr commented 1 month ago

Could this happen because I named a line type as textline and PAGE-XML already has a TAG named TextLine? I renamed all my data to text_line and it seems the model starts to train.

mittagessen commented 1 month ago

17 pages is insufficient to train from scratch and already borderline on the low end when fine-tuning from a base model. Try fine-tuning from the default model [0] with the -i option.

[0] https://github.com/mittagessen/kraken/raw/refs/heads/main/kraken/blla.mlmodel

johnlockejrr commented 1 month ago

UPDATE:

As I thought:

image

So seems to be a bug with PAGE-XML files if you use textzone as line type.

johnlockejrr commented 1 month ago

17 pages is insufficient to train from scratch and already borderline on the low end when fine-tuning from a base model. Try fine-tuning from the default model [0] with the -i option. [0] https://github.com/mittagessen/kraken/raw/refs/heads/main/kraken/blla.mlmodel

Yes, I'm aware of that! Was just a test, the bug exists anyway, renaming the line type solved the problem.

johnlockejrr commented 1 month ago

UPDATE:

I finetuned with blla but the result on lines is very bad, many Polygonizer failed on line 0...

image

Any idea why do I get Polygonizer failed on line 0 when training a seg model with blla? With the other pretrained I don't get this error.

johnlockejrr commented 1 month ago

My last seg train with kraken on very good ground truth, I have no idea what happens.. finetuned blla:

image

Example of GT I trained on:

image

johnlockejrr commented 2 weeks ago

Any thoughts?

mittagessen commented 23 hours ago

Are you running a 'custom' install where you installed some dependencies manually? The polygons look like they where produced with a kraken that uses an incompatible shapely version.

Could you just run the contrib/segmentation_overlay.py with your model/XML file and see if you get the same result? And install the latest 5.3.0 release in a clean environment and then do the overlay again to see if that produces the expected bounding polygons?

johnlockejrr commented 22 hours ago

I made a new fresh install with latest kraken, all is well now, only some minor problems at the last line at the bottom of the page but that's fine.

image

Pretty well (another script)