hplt-project / OpusTrainer

Curriculum training
https://pypi.org/project/opustrainer/
MIT License
15 stars 5 forks source link

monitoring training progress #6

Open jelmervdl opened 1 year ago

jelmervdl commented 1 year ago

There's already a tensorboard-marian connector. We can either plug into that or write our own version of it. We have the added benefit of having direct access to marian's stdout and stderr so we can just read directly from there.

Regular expressions: https://github.com/marian-nmt/marian-tensorboard/blob/b9867c43472a27783611accba93adebda60ba462/src/marian_tensorboard/marian_tensorboard.py#L107-L125

Added benefit of doing the integration ourselves: we can also push dataset events to tensorboard, like epoch events and training stages.

Slightly related to #3.

XapaJIaMnu commented 1 year ago

Add to this issue:

Advance to a new state when marian reports stall in a validation set.

This can be used to automatically find the optimal point to transition between stages combined with resetting the optimizer inside marian so that our new dataset mixture doesn't get its gradients penalised too hard from the change of data.