Open johnlockejrr opened 1 month ago
UPDATE:
I tried training only for lines but seems they are not trained at all!
(kraken-5.2.9) incognito@DESKTOP-NHKR7QL:~/kraken-train/102_Petrov_isbach$ ketos segtrain -d cuda:0 -f page -t output.txt -q early -cl --min-epochs 40 --suppress-regions -o /home/incognito/kraken-train/102_Petrov_isbach/seg_v2/isbach_seg_v2
Training line types:
textline 2 1034
Training region types:
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA GeForce RTX 4070') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃ In sizes ┃ Out sizes ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0 │ net │ MultiParamSequential │ 1.3 M │ [1, 3, 1800, 300] │ [[1, 3, 450, 75], '?'] │
│ 1 │ net.C_0 │ ActConv2D │ 9.5 K │ [[1, 3, 1800, 300], '?'] │ [[1, 64, 900, 150], '?'] │
│ 2 │ net.Gn_1 │ GroupNorm │ 128 │ [[1, 64, 900, 150], '?', '?'] │ [[1, 64, 900, 150], '?'] │
│ 3 │ net.C_2 │ ActConv2D │ 73.9 K │ [[1, 64, 900, 150], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 4 │ net.Gn_3 │ GroupNorm │ 256 │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 5 │ net.C_4 │ ActConv2D │ 147 K │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 6 │ net.Gn_5 │ GroupNorm │ 256 │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 7 │ net.C_6 │ ActConv2D │ 295 K │ [[1, 128, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 8 │ net.Gn_7 │ GroupNorm │ 512 │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 9 │ net.C_8 │ ActConv2D │ 590 K │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 10 │ net.Gn_9 │ GroupNorm │ 512 │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 11 │ net.L_10 │ TransposedSummarizingRNN │ 74.2 K │ [[1, 256, 450, 75], '?', '?'] │ [[1, 64, 450, 75], '?'] │
│ 12 │ net.L_11 │ TransposedSummarizingRNN │ 25.1 K │ [[1, 64, 450, 75], '?', '?'] │ [[1, 64, 450, 75], '?'] │
│ 13 │ net.C_12 │ ActConv2D │ 2.1 K │ [[1, 64, 450, 75], '?', '?'] │ [[1, 32, 450, 75], '?'] │
│ 14 │ net.Gn_13 │ GroupNorm │ 64 │ [[1, 32, 450, 75], '?', '?'] │ [[1, 32, 450, 75], '?'] │
│ 15 │ net.L_14 │ TransposedSummarizingRNN │ 16.9 K │ [[1, 32, 450, 75], '?', '?'] │ [[1, 64, 450, 75], '?'] │
│ 16 │ net.L_15 │ TransposedSummarizingRNN │ 25.1 K │ [[1, 64, 450, 75], '?', '?'] │ [[1, 64, 450, 75], '?'] │
│ 17 │ net.l_16 │ ActConv2D │ 195 │ [[1, 64, 450, 75], '?', '?'] │ [[1, 3, 450, 75], '?'] │
│ 18 │ val_px_accuracy │ MultilabelAccuracy │ 0 │ ? │ ? │
│ 19 │ val_mean_accuracy │ MultilabelAccuracy │ 0 │ ? │ ? │
│ 20 │ val_mean_iu │ MultilabelJaccardIndex │ 0 │ ? │ ? │
│ 21 │ val_freq_iu │ MultilabelJaccardIndex │ 0 │ ? │ ? │
└────┴───────────────────┴──────────────────────────┴────────┴───────────────────────────────┴──────────────────────────┘
Trainable params: 1.3 M
Non-trainable params: 0
Total params: 1.3 M
Total estimated model params size (MB): 5
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.27it/s val_accuracy: 0.978 val_mean_acc: 0.978 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 0/10 0.00000
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.03it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 1/10 0.00000
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.07it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 2/10 0.00000
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.11it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 3/10 0.00000
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.01it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 4/10 0.00000
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 0:00:05 • 0:00:00 3.03it/s val_accuracy: 0.986 val_mean_acc: 0.986 val_mean_iu: 0.000 val_freq_iu: 0.000 early_stopping: 5/10 0.00000
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/17 0:00:00 • -:--:-- 0.00it/s early_stopping: 5/10 0.00000
[10/15/24 16:07:50] WARNING Model did not improve during training.
Could this happen because I named a line type
as textline
and PAGE-XML already has a TAG named TextLine
?
I renamed all my data to text_line
and it seems the model starts to train.
17 pages is insufficient to train from scratch and already borderline on
the low end when fine-tuning from a base model. Try fine-tuning from
the default model [0] with the -i
option.
[0] https://github.com/mittagessen/kraken/raw/refs/heads/main/kraken/blla.mlmodel
UPDATE:
As I thought:
So seems to be a bug with PAGE-XML
files if you use textzone
as line type
.
17 pages is insufficient to train from scratch and already borderline on the low end when fine-tuning from a base model. Try fine-tuning from the default model [0] with the
-i
option. [0] https://github.com/mittagessen/kraken/raw/refs/heads/main/kraken/blla.mlmodel
Yes, I'm aware of that! Was just a test, the bug exists anyway, renaming the line type
solved the problem.
UPDATE:
I finetuned with blla
but the result on lines is very bad, many Polygonizer failed on line 0
...
Any idea why do I get Polygonizer failed on line 0
when training a seg model with blla
? With the other pretrained I don't get this error.
My last seg train with kraken on very good ground truth, I have no idea what happens.. finetuned blla:
Example of GT I trained on:
Any thoughts?
Are you running a 'custom' install where you installed some dependencies manually? The polygons look like they where produced with a kraken that uses an incompatible shapely version.
Could you just run the contrib/segmentation_overlay.py
with your model/XML file and see if you get the same result? And install the latest 5.3.0 release in a clean environment and then do the overlay again to see if that produces the expected bounding polygons?
I made a new fresh install with latest kraken, all is well now, only some minor problems at the last line at the bottom of the page but that's fine.
Pretty well (another script)
I try to train a segmentation model with
page-xml
data, at start the segmenter shows me the regions and line types but when using the model no lines is detected at all!Result: