Open stweil opened 3 months ago
I just found my previous issue #302 for that. Is this an eScriporium issue which does not use the kraken API correctly?
On 24/03/17 02:16PM, Stefan Weil wrote:
I just found my previous issue #302 for that. Is this an eScriporium issue which does not use the kraken API correctly?
We never really tried to make eScriptorium reproducible and it is currently not possible to make training 100% reproducible because of cuda/cudnn limitations. You can try the deterministic training switch on ketos but you'll still see differences between machines/library versions/phase of the moon.
I currently struggle with eS trainings which end with a model which claims to have 100 % accuracy although all epochs show accuracies lower than 99 %. When I export the final model and examine its metadata, I can see that it is always the model from epoch 0 (eS starts counting the epochs with 0, so it's the result from the first epoch).
Hmm, you can set deterministic=warn
on the KrakenTrainer
object in eScriptorium which should eliminate most non-deterministic behavior but won't get rid of it completely. Shuffling the training data twice shouldn't really have an impact as the state of the RNG remains the same between two training runs (if you restart the workers). Otherwise we'd need to re-seed it for each task. IIRC CUDA CTC is always non-deterministic.
Ideally, a training process should be reproducible, as this is required by good scientific practice.
Currently the kraken training is not reproducible. Two recognition trainings with the same ground truth and the same base model give different results (number of epochs, accuracies for the different intermediate models).
eScriptorium shuffles the ground truth randomly, but always uses the same seed, so the resulting training and validation sets are reproducible. But it looks like the training shuffles the training set once more, and that does not seem to be reproducible.