KeyError: 'completed_epochs' error during segtrain

alexislitvine commented 2 months ago

@mittagessen - I am trying to train a segmentation model, but I get the following error each time:

Trainable params: 1.3 M                                                                                                               
Non-trainable params: 0                                                                                                               
Total params: 1.3 M                                                                                                                   
Total estimated model params size (MB): 5                                                                                             
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179/179 0:06:03 • 0:00:00 0.46it/s val_accuracy: 0.917             early_stopping: 0/10 -inf
                                                                             val_mean_acc: 0.917                                      
                                                                             val_mean_iu: 0.121 val_freq_iu:                          
                                                                             0.37       

Error:

│ /local/filespace/kraken/venv/lib/python3.10/site-packages/kraken/lib/train.py:192 in             │
│ on_validation_end                                                                                │
│                                                                                                  │
│    189 │   """                                                                                   │
│    190 │   def on_validation_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule")   │
│    191 │   │   if not trainer.sanity_checking:                                                   │
│ ❱  192 │   │   │   trainer.model.nn.hyper_params['completed_epochs'] += 1                        │
│    193 │   │   │   metric = float(trainer.logged_metrics['val_metric']) if 'val_metric' in trai  │
│    194 │   │   │   trainer.model.nn.user_metadata['accuracy'].append((trainer.global_step, metr  │
│    195 │   │   │   trainer.model.nn.user_metadata['metrics'].append((trainer.global_step, {k: f  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'completed_epochs'

I used: $ ketos segtrain -f page -N 100 -q early --min-epochs 50 -d cuda:0 -o BL_27042024 -t output.txt --suppress-baselines ━━

mittagessen commented 2 months ago

I've tagged a new 5.2.3 release with a hotfix. There's another small bug with resizing the output layer when fine-tuning that I'll get to tomorrow.

alexislitvine commented 2 months ago

Works fine - amazing!

mittagessen / kraken

KeyError: 'completed_epochs' error during segtrain #600