Closed sruteesh closed 5 years ago
@sruteesh We haven't done any transfer learning yet.
However, we are very interested in learning of your results as we've thought of employing transfer learning to create STT models for resource poor languages using resource rich ones.
@kdavis-mozilla Any help on publishing the models you guys trained on for transfer learning? Also any thoughts on how good is youtube and its auto generated captions as dataset for the training. I am planning to implement transfer learning based on https://arxiv.org/pdf/1706.00290.pdf Thanks
@sruteesh In the next few weeks we should be able to publish a good English model.
After the PR's[1][2] land and we do some polish work we should be able to publish a model that's at about 6.5% WER on librivox clean test data set.
We haven't used the youtube and its auto generated captions as dataset because the theoretical lower bound WER for a system trained on such a data set will be the WER of Google's TTS. We'd rather have a theoretical lower bound of 0%.
1706.00290 looks interesting, and the authors happen to be just around the corner from where I am. I should contact them.
Kool, would be great to test your model. My dataset is partly read speech and partly conversational. I haven't seen much difference in the training models of Fisher or Swbd (which are telephonic conversations) compared to TED or Librispeech (which are read speech). Any suggestions on which model should i start to train.
I'd combine Librispeech and Fisher. Librispeech for the read speech and Fisher for the conversational. This wold allow the model the learn something closer to your data sets.
@kdavis-mozilla Thanks for the reply.
"combine Librispeech and Fisher"?
I don't see much difference in the models except for additional parameters in Librispeech like
--display_step 5 \ --validation_step 5 \ --dropout_rate 0.30 \ --default_stddev 0.046875 \
Can u explain what did u mean by combine Librispeech and Fisher
Thanks.
All I mean is that you, or we, could train on Librispeech and Fisher together as it would be closest to your data set.
To do this all you'd need to do it add the corresponding csv files to the command line. Something like
...
--train_files "/data/LDC/fisher-train.csv,/data/OpenSLR/librivox-train-clean-100.csv,/data/OpenSLR/librivox-train-clean-360.csv,/data/OpenSLR/librivox-train-other-500.csv
...
where again the exact paths depend on where you stored the data.
I'd guess for this combination the dropout should be around 0.2367
, hard won knowledge.
Hi @sruteesh even I was trying to train the model for Indian English accent and other regional languages, but because of data collection it's been delayed! Could you share your experience/results on transfer learning and model trained using libri + Fisher dataset. Thanks
@prashantmaheshwari94 I haven't tried transfer learning yet as I do not have any trained model readily available. Training from scratch on youtube dataset ( Indian English news and lectures , approx 1500 hrs) I have reached around 35% WER and around 15% CER. What type of Indian English dataset do you have?
@sruteesh That is some improvement from Librispeech may be with better and more data it will get better. As of now I don't have any data, but we are developing tool to gather it as it is not readily available.
@sruteesh Can you grab my pre-trained model and train just last three layers? I think it'll give you much better model.. If you are interested I'll provide you procedures.
@reith Great. You've posted some really good results. Will definitely try it out.
@reith I have few issues regarding the model. How do I reach out to you. Thanks
@sruteesh feel free to drop a mail, ameretat.reith@gmail.com. You can also open an issue, I'm willing to help step by step for further transfer learning.
On Oct 17, 2017 10:03 AM, "sruteesh" notifications@github.com wrote:
@reith https://github.com/reith I have few issues regarding the model. How do I reach out to you. Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mozilla/DeepSpeech/issues/821#issuecomment-337131925, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlVuQ6fj7UhUxI6F3dXyfQ5pyYLryMBks5stEolgaJpZM4PP9mA .
@sruteesh , what tools do you use for downloading youtube video & subtitle files, converting video -> audio, extracting the sentence audio segments from the audio file (from subtitle data)? can you please share your github repo if you don't mind? Thanks a lot!
@sruteesh and @reith i need both of your help to implement deep speech in my project so please share your contact details murugan@telecmi.com
@abuvaneswari and @murugancmi Please ping me on sruteeshkumar@gmail.com if you still need help with youtube dataset etc..
I'm not sure there is anything to do for us here. A discussion should be better taken on Discourse: https://discourse.mozilla.org/c/deep-speech
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
I'm trying to train the Deepspeech model on Indian English and Hindi separately. For Indian english I was hoping to get some pointers on transfer learning (based on model trained on Librispeech (has ~40% WER)). Any points to be noted before training it on Indian English audio dataset (~200 hrs or more of youtube audio and captions). Does it help using transfer learning? Or should i train from scratch on only Indian english dataset.
Thanks