Closed Svetlana-Yatsyk closed 11 months ago
On 23/10/19 01:33AM, Svetlana Yatsyk wrote:
Hello,
I am discovering the reading order models and I have a couple of questions about their training.
First, I trained a RO model on the data exported from eScriptorium. "default" lines are not present in my ontology. I double checked the PAGE files: tag "default" is nowhere to be found.
However, after tring to add the RO model to the segmentation model, I get this error:
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API. Adding /content/gdrive/MyDrive/reading_order_models/RO_242.mlmodel reading order model to /content/gdrive/MyDrive/yaltai/segm_baselines_medieval_Thibault.mlmodel. Line classes known to RO model: DefaultLine 1 > default 2 Line classes known to segmentation model: DefaultLine 2 Usage: ketos roadd [OPTIONS] Try 'ketos roadd --help' for help.
Error: Model /content/gdrive/MyDrive/yaltai/segm_baselines_medieval_Thibault.mlmodel and /content/gdrive/MyDrive/reading_order_models/RO_242.mlmodel class mappings mismatch.
Where does "default" line class come from in my RO model?
Any line that doesn't have a class is assigned to default
. There's
probably one or more stray lines that aren't more specifically
annotated.
And the second question: why does the trainin process, stopped because of the early stopping, gives me this error: TypeError: '>=' not supported between instances of 'int' and 'str' ?
That's probably a bug. Could you give me the whole traceback? It should say more specifically where the error occurred.
Here it is:
stage 534/∞ ━━━━━━━━━━━━━━━━━ 23/23 0:00:17 • 0:00:00 1.35it/s val_spearman: early_stopping:
0.059 val_loss: 300/300 0.05816
0.183
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /usr/local/bin/ketos:8 in Trainer.fit
stopped: max_epochs={self.max_epochs!r}
rea │
│ 178 │ │ │ return True │
│ 179 │ │ │
│ ❱ 180 │ │ if self.trainer.should_stop and self._can_stop_early: │
│ 181 │ │ │ rank_zero_debug("Trainer.fit
stopped: trainer.should_stop
was set.") │
│ 182 │ │ │ return True │
│ 183 │
│ │
│ /usr/local/lib/python3.9/dist-packages/pytorch_lightning/loops/fit_loop.py:147 in │
│ _can_stop_early │
│ │
│ 144 │ │
│ 145 │ @property │
│ 146 │ def _can_stop_early(self) -> bool: │
│ ❱ 147 │ │ met_min_epochs = self.epoch_progress.current.processed >= self.min_epochs if sel │
│ 148 │ │ met_min_steps = self.epoch_loop.global_step >= self.min_steps if self.min_steps │
│ 149 │ │ return met_min_epochs and met_min_steps │
│ 150 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: '>=' not supported between instances of 'int' and 'str'
Thanks. I'll see if I can reproduce it but it looks like a bug in pytorch-lightning.
I am sorry for bothering you, but I still have the class mapping issue. I checked again, all the lines in my dataset are annotated (custom="structure {type:DefaultLine;}"). Here are the files I use: plutei_28_sin2.zip. However, the RO model trained on this data still knows two classes.
I ran a test with only 3 pages, and still got the same error.
I ran a test with only 3 pages, and still got the same error.
OK, then there's probably an issue with a default class which I didn't test during development. I'm teaching today but will have some time to check it during the weekend.
Dear Ben, do you by chance have any news on this subject?
This might be naive, but I'll still ask: wouldn't it be reasonable to map only those classes that are common in both ro_net
and seg_net
and ignore the rest in ro_net
. Sometimes a user might exclude a few less frequent classes when training a segmentation model with -vb
but rotrain
doesn't provide any such option, and I think I can also see why. In that case, roadd
will always fail.
On 23/10/24 02:23AM, Svetlana Yatsyk wrote:
Dear Ben, do you by chance have any news on this subject?
Yes, sorry for the delay. I've found the bug, the fix is in the process of being merged.
Thank you for looking at it!
I tried to train a model on 2 images to check whether the bug was gone. After reaching the set number of epochs, the training stopped with an error (the one you explained by a bug in pytorch-lightning)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /usr/local/bin/ketos:8 in Trainer.fit
stopped: max_epochs={self.max_epochs!r}
rea │
│ 178 │ │ │ return True │
│ 179 │ │ │
│ ❱ 180 │ │ if self.trainer.should_stop and self._can_stop_early: │
│ 181 │ │ │ rank_zero_debug("Trainer.fit
stopped: trainer.should_stop
was set.") │
│ 182 │ │ │ return True │
│ 183 │
│ │
│ /usr/local/lib/python3.9/dist-packages/pytorch_lightning/loops/fit_loop.py:147 in │
│ _can_stop_early │
│ │
│ 144 │ │
│ 145 │ @property │
│ 146 │ def _can_stop_early(self) -> bool: │
│ ❱ 147 │ │ met_min_epochs = self.epoch_progress.current.processed >= self.min_epochs if sel │
│ 148 │ │ met_min_steps = self.epoch_loop.global_step >= self.min_steps if self.min_steps │
│ 149 │ │ return met_min_epochs and met_min_steps │
│ 150 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: '>=' not supported between instances of 'int' and 'str'
However, I got several models, which I tried to add to a segmentation model, but did not succeed, again, because of the class mapping mismatch.
I am training a model on these 2 xml files: https://drive.google.com/drive/folders/1-hj_dO9EOLX20nSSp7DgD4MfoX6ZUtY6?usp=sharing All the lines have {type:DefaultLine;}, there is not a single "default" line. However, when I launch the training, I see, that the "default" lines are present in the training data.
Please, help me understand the reasoning behind it.
Sorry, I had screwed up the merge into the main branch and for some reason the earlier fix didn't get in there. You should now be able to train reading orders from main branch kraken without spurious line types.
Hello,
I am discovering the reading order models and I have a couple of questions about their training.
First, I trained a RO model on the data exported from eScriptorium. "default" lines are not present in my ontology. I double checked the PAGE files: tag "default" is nowhere to be found.
However, after tring to add the RO model to the segmentation model, I get this error:
Where does "default" line class come from in my RO model?
And the second question: why does the trainin process, stopped because of the early stopping, gives me this error: TypeError: '>=' not supported between instances of 'int' and 'str' ?