Updates the distillation script, such that the user can configure between training on pseudo-labels or text labels as their targets. This enables the user to run "knowledge distillation", but directly on the text labels provided by an ASR dataset, such as Common Voice. The pseudo-labelling step can effectively be skipped when training like this.
To train directly on the text labels provided in the dataset, set --use_pseudo_labels=False and pass the correct --text_column_name for your text targets. For example, training distil-large-v3 on the Common Voice 15 dataset with no pseudo-labels:
Updates the distillation script, such that the user can configure between training on pseudo-labels or text labels as their targets. This enables the user to run "knowledge distillation", but directly on the text labels provided by an ASR dataset, such as Common Voice. The pseudo-labelling step can effectively be skipped when training like this.
To train directly on the text labels provided in the dataset, set
--use_pseudo_labels=False
and pass the correct--text_column_name
for your text targets. For example, trainingdistil-large-v3
on the Common Voice 15 dataset with no pseudo-labels:=> we use the same target labels as fine-tuning, but with the teacher influence during training with the KL-div loss.