wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
630 stars 109 forks source link

Train on new Language #265

Closed emanueleielo closed 4 months ago

emanueleielo commented 6 months ago

Hi, there is a recipe to follow on how to train on a new Language?

Thanks a lot

JiJiJiang commented 6 months ago

Every stage is the same, except that you should use your own training data of the new language and prepare the wav.scp and utt2spk files.

omarabb315 commented 6 months ago

@JiJiJiang could you provide some details on preparing the dataset? the current shape of the data is that I am having a couple of thousands of audio clips each paired with transcription and speaker id.

Also will there be anything wrong if the pretrained model is trained on another language?

Thank you

JiJiJiang commented 6 months ago
  1. Download the Voxceleb dataset and run prepare_data.sh to see what wav.scp and utt2spk should be. Prepare the same format as they are.
  2. It should be OK fine-tuning the pre-trained model with data of another language.
omarabb315 commented 6 months ago

Thank you for the response @JiJiJiang Now I have these fields in the dataset, (audio, text, speaker_id) 1) Are they enough? 2) And what is the utterance_id?

JiJiJiang commented 6 months ago
  1. Yes. audio and speaker_id are needed.
  2. utterance_id is the utterance name.
omarabb315 commented 5 months ago

Thank you @JiJiJiang , Now after preparing the data can you provide me with some details about the required steps to fine-tune or pretrain a model, like the one in this link

I tried a lot to understand what do you mean by stages from the github but couldn’t get to the right path.

emanueleielo commented 5 months ago

Thank you @JiJiJiang , Now after preparing the data can you provide me with some details about the required steps to fine-tune or pretrain a model, like the one in this link

I tried a lot to understand what do you mean by stages from the github but couldn’t get to the right path.

https://github.com/wenet-e2e/wespeaker/issues/250 On this issue they say that will prepare the docs but I didn't find it. Did you found something to fine-tune the pretrain model?

omarabb315 commented 5 months ago

@emanueleielo unfortunately didn't find any thoughts on how to fine-tune the model

emanueleielo commented 5 months ago

@emanueleielo unfortunately didn't find any thoughts on how to fine-tune the model

@JiJiJiang it would be great to have some information about it, even how much time the trained model has required for the training