m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.24k stars 1.18k forks source link

Is it possible to fine-tune this model, or any method to update the vocabulary of it? #530

Open ShivinM-17 opened 10 months ago

ShivinM-17 commented 10 months ago

Issue Description: When utilizing the 'medium' model, it has come to my attention that certain medical terms may not be transcribed accurately. For instance: "Amoxicillin" is transcribed as "Amoxanine." "Effexor" is transcribed as "Afexa."

So, is there any possibility of fine-tuning the 'medium' model to enhance its capability to detect and transcribe such words more accurately?

Can someone provide a sample/reference code if there is any possibility?

remic33 commented 10 months ago

You can finetune faster_whisper models. Just fine tune a regular whisper model and you will be able to pass those weigth to faster_whisper model.

davidlandais commented 10 months ago

Fine-tune the standard whisper model. Here is an entry article: https://huggingface.co/blog/fine-tune-whisper Since faster-whisper is using ctranslate2 you will need to concert your model: https://opennmt.net/CTranslate2/guides/transformers.html#whisper Because WhisperX is using faster_whisper, you should use a translated model. So here is the chain.

0xm00n commented 2 months ago

Fine-tune the standard whisper model. Here is an entry article: https://huggingface.co/blog/fine-tune-whisper Since faster-whisper is using ctranslate2 you will need to concert your model: https://opennmt.net/CTranslate2/guides/transformers.html#whisper Because WhisperX is using faster_whisper, you should use a translated model. So here is the chain.

So I just need to convert a standard whisper model using ctranslate2 and pass the resulting model to WhisperX?