Open YYUUUY opened 3 years ago
Currently working on that. If you want to decode using transformer LM for English. Please do the following:
mkdir trans_LM ; cd trans_LM
wget https://github.com/mailong25/self-supervised-speech-recognition/blob/master/examples/lm_librispeech_word_transformer.dict
wget https://dl.fbaipublicfiles.com/wav2letter/sota/2019/lm/lm_librispeech_word_transformer.pt
wget https://github.com/mailong25/self-supervised-speech-recognition/blob/master/examples/dict.txt
cd ...
Then run the inference as follow:
from stt import Transcriber
transcriber = Transcriber(pretrain_model = 'path/to/pretrain.pt', finetune_model = 'path/to/finetune.pt',
dictionary = 'path/to/dict.ltr.txt',
lm_type = 'fairseqlm',
lm_lexicon = 'path/to/trans_LM/lm_librispeech_word_transformer.dict,
lm_model = 'path/to/trans_LM/lm_librispeech_word_transformer.pt,
lm_weight = 1.5, word_score = -1, beam_size = 50)
hypos = transcriber.transcribe(['path/to/wavs/0_1.wav','path/to/wavs/0_2.wav'])
print(hypos)
The pre-train model should be the model with no fine-tuning on the labeled data https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_new.pt
@mailong25 Thank you
--