igorsitdikov / lid_kaldi

Apache License 2.0
22 stars 6 forks source link

How you have trained the LID model #1

Closed Durgesh92 closed 3 years ago

Durgesh92 commented 3 years ago

Can you share your LID training recipe and data preparation guide?

igorsitdikov commented 3 years ago

Yes, sure. It's not a secret. I used https://github.com/kaldi-asr/kaldi/blob/master/egs/sre16/v2/run.sh. In utt2spk file I used utt_id lang instead of utt_id speaker

Durgesh92 commented 3 years ago

Thanks, and to use the trained model with your vosk modified src what's the model structure? Can you please share your trained model to test?

igorsitdikov commented 3 years ago

You can find it here https://github.com/igorsitdikov/lid_kaldi/releases/tag/1.0.0

Durgesh92 commented 3 years ago

Thanks for the quick response. Also, I have a question about training. How much data do you recommend for each language? also is it necessary to have an even distribution of data volume for each language?

igorsitdikov commented 3 years ago

https://arxiv.org/abs/2011.12998