huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.32k stars 238 forks source link

[training] freeze decoder #126

Closed eustlb closed 1 month ago

eustlb commented 2 months ago

Add the possibility to freeze the decoder. Note that freezing the decoder will freeze the decoder embed_tokens layer that is by default tied to proj_out layers (not a layer of the decoder). This way, proj_out also gets frozen, it is thus necessary to unfreeze it.

other minor changes: