Closed abscodeice closed 7 months ago
Hi @abscodeice thank you for raising a detailed issue, and I'm sorry that message isn't clearer.
I think you're wondering if it's an error message?
If so, it's not; it's just telling you that vak stopped training the model because the accuracy as measured on the validation set had not improved after six "validation steps". This is expected behavior.
Just so we're on the same page: the validation step is when vak stops running training batches through the model (1 global "step" is one training batch), and computes metrics on the validation set with the current model, before resuming training. If the accuracy computed for the validation set increases, then vak saves a checkpoint (with max-val-acc
in the filename).
The frequency with which vak does this is controlled by the val_step
option in the config file.
E.g., for the experiments with Bengalese finch song in the paper we set val_step = 400
, meaning "every 400 steps / batches, stop and compute validation metrics"
https://github.com/yardencsGitHub/tweetynet/blob/eab406f0590f4e90a36530cada52a78b0c676a80/article/data/configs/Bengalese_Finches/learncurve/revision/config_BFSongRepository_bl26lb16_learncurve.toml#L23
We have not yet tested extensively on mouse USVs (work in progress 🙂) but based on what we know from other animal vocalizations, I would guess that if the model has stopped improving after 6 validation steps, then that's probably as good as it's going to get.
The number of validation steps without any improvement that vak will run before stopping training early is controlled by the patience
option in the config file, as defined here: https://vak.readthedocs.io/en/latest/reference/config.html#vak.config.train.TrainConfig.patience
For the Bengalese finch experiments in the paper we used patience=4
, and you're using 6, so I would guess that your model is probably pretty well trained.
https://github.com/yardencsGitHub/tweetynet/blob/eab406f0590f4e90a36530cada52a78b0c676a80/article/data/configs/Bengalese_Finches/learncurve/revision/config_BFSongRepository_bl26lb16_learncurve.toml#L25
The only exception might be if you have a very small val_step
-- sometimes if you check too frequently, then at the start of training, the validation accuracy still goes up and down a lot, and so a small val_step
combined with a relatively low patience
could cause the model to stop early when it would have still been able to improve more.
If you want to be extra careful and troubleshoot this, you can do the following (in increasing order of complexity):
Please let me know if that helps! Happy to answer more questions, and even jump on a Zoom call to give you some tech support if you need it. We're glad to see your lab is still using TweetyNet + vak. You can also feel free to join our form and ask questions there (to benefit from the hive mind 🙂): https://forum.vocalpy.org/
Please let me know
Going to close this -- just let us know if we can help you @abscodeice! You can also feel free to email me at nicholdav at gmail if you prefer
Hi guys, I'm having trouble interpreting this termination message. Can anyone tell me the most likely cause of it?
thanx, Amanda
Some Info: (0) Data: ultrasonic vocalizations (srate: 250kHz; nfft: 1024; step: 480) (1) training set (1200s), validation set (200s) // number of classes: 13 (including background) (2) Most frequent class = 2228 calls; Less frequent class = 58