mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.06k stars 3.94k forks source link

Transfer Learning on Indian English #821

Closed sruteesh closed 5 years ago

sruteesh commented 7 years ago

I'm trying to train the Deepspeech model on Indian English and Hindi separately. For Indian english I was hoping to get some pointers on transfer learning (based on model trained on Librispeech (has ~40% WER)). Any points to be noted before training it on Indian English audio dataset (~200 hrs or more of youtube audio and captions). Does it help using transfer learning? Or should i train from scratch on only Indian english dataset.

Thanks

kdavis-mozilla commented 7 years ago

@sruteesh We haven't done any transfer learning yet.

However, we are very interested in learning of your results as we've thought of employing transfer learning to create STT models for resource poor languages using resource rich ones.

sruteesh commented 7 years ago

@kdavis-mozilla Any help on publishing the models you guys trained on for transfer learning? Also any thoughts on how good is youtube and its auto generated captions as dataset for the training. I am planning to implement transfer learning based on https://arxiv.org/pdf/1706.00290.pdf Thanks

kdavis-mozilla commented 7 years ago

@sruteesh In the next few weeks we should be able to publish a good English model.

After the PR's[1][2] land and we do some polish work we should be able to publish a model that's at about 6.5% WER on librivox clean test data set.

We haven't used the youtube and its auto generated captions as dataset because the theoretical lower bound WER for a system trained on such a data set will be the WER of Google's TTS. We'd rather have a theoretical lower bound of 0%.

1706.00290 looks interesting, and the authors happen to be just around the corner from where I am. I should contact them.

sruteesh commented 7 years ago

Kool, would be great to test your model. My dataset is partly read speech and partly conversational. I haven't seen much difference in the training models of Fisher or Swbd (which are telephonic conversations) compared to TED or Librispeech (which are read speech). Any suggestions on which model should i start to train.

kdavis-mozilla commented 7 years ago

I'd combine Librispeech and Fisher. Librispeech for the read speech and Fisher for the conversational. This wold allow the model the learn something closer to your data sets.

sruteesh commented 7 years ago

@kdavis-mozilla Thanks for the reply. "combine Librispeech and Fisher"? I don't see much difference in the models except for additional parameters in Librispeech like --display_step 5 \ --validation_step 5 \ --dropout_rate 0.30 \ --default_stddev 0.046875 \ Can u explain what did u mean by combine Librispeech and Fisher Thanks.

kdavis-mozilla commented 7 years ago

All I mean is that you, or we, could train on Librispeech and Fisher together as it would be closest to your data set.

To do this all you'd need to do it add the corresponding csv files to the command line. Something like

...
  --train_files "/data/LDC/fisher-train.csv,/data/OpenSLR/librivox-train-clean-100.csv,/data/OpenSLR/librivox-train-clean-360.csv,/data/OpenSLR/librivox-train-other-500.csv
...

where again the exact paths depend on where you stored the data.

I'd guess for this combination the dropout should be around 0.2367, hard won knowledge.

prashantmaheshwari94 commented 6 years ago

Hi @sruteesh even I was trying to train the model for Indian English accent and other regional languages, but because of data collection it's been delayed! Could you share your experience/results on transfer learning and model trained using libri + Fisher dataset. Thanks

sruteesh commented 6 years ago

@prashantmaheshwari94 I haven't tried transfer learning yet as I do not have any trained model readily available. Training from scratch on youtube dataset ( Indian English news and lectures , approx 1500 hrs) I have reached around 35% WER and around 15% CER. What type of Indian English dataset do you have?

prashantmaheshwari94 commented 6 years ago

@sruteesh That is some improvement from Librispeech may be with better and more data it will get better. As of now I don't have any data, but we are developing tool to gather it as it is not readily available.

reith commented 6 years ago

@sruteesh Can you grab my pre-trained model and train just last three layers? I think it'll give you much better model.. If you are interested I'll provide you procedures.

sruteesh commented 6 years ago

@reith Great. You've posted some really good results. Will definitely try it out.

sruteesh commented 6 years ago

@reith I have few issues regarding the model. How do I reach out to you. Thanks

reith commented 6 years ago

@sruteesh feel free to drop a mail, ameretat.reith@gmail.com. You can also open an issue, I'm willing to help step by step for further transfer learning.

On Oct 17, 2017 10:03 AM, "sruteesh" notifications@github.com wrote:

@reith https://github.com/reith I have few issues regarding the model. How do I reach out to you. Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mozilla/DeepSpeech/issues/821#issuecomment-337131925, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlVuQ6fj7UhUxI6F3dXyfQ5pyYLryMBks5stEolgaJpZM4PP9mA .

abuvaneswari commented 6 years ago

@sruteesh , what tools do you use for downloading youtube video & subtitle files, converting video -> audio, extracting the sentence audio segments from the audio file (from subtitle data)? can you please share your github repo if you don't mind? Thanks a lot!

murugancmi commented 6 years ago

@sruteesh and @reith i need both of your help to implement deep speech in my project so please share your contact details murugan@telecmi.com

sruteesh-pivot commented 6 years ago

@abuvaneswari and @murugancmi Please ping me on sruteeshkumar@gmail.com if you still need help with youtube dataset etc..

lissyx commented 5 years ago

I'm not sure there is anything to do for us here. A discussion should be better taken on Discourse: https://discourse.mozilla.org/c/deep-speech

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.