phineas-pta / fine-tune-whisper-vi

jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2
Apache License 2.0
6 stars 2 forks source link
aws docker fine-tuning lora multi-gpu-training speech-recognition speech-to-text vietnamese whisper

fine-tune whisper vi

jupyter notebooks to fine tune whisper models on vietnamese using kaggle (should also work on colab but not throughly tested)

using my collection of vietnamese speech datasets: https://huggingface.co/collections/doof-ferb/vietnamese-speech-dataset-65c6af8c15c9950537862fa6

N.B.1 import any trainer or pipeline class from transformers crash kaggle TPU session (see huggingface/transformers#28609) so better use GPU

N.B.2 trainer class from transformers can auto use multi-GPU like kaggle free T4×2 without code change by default trainer use naive model parallelism which cannot fully use all gpu in same time, so better use distributed data parallelism

N.B.3 use default greedy search, because beam search trigger a spike in VRAM usage which may cause out-of-memory (original whisper use num beams = 5, something like do_sample=True, num_beams=5)

N.B.4 if use kaggle + resume training, remember to enable files persistency before launching

scripts

evaluate accuracy (WER) with batched inference:

fine-tune whisper tiny with traditional approach:

fine-tine whisper large with PEFT-LoRA + int8:

(testing - not always working) fine-tune wav2vec v2 bert: w2v-bert-v2.ipynb

docker image to run on AWS EC2: Dockerfile, comes with standalone scripts

convert to openai-whisper, whisper.cpp, faster-whisper, ONNX, TensorRT: not yet

miscellaneous: convert to huggingface audio datasets format

resources