mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.23k stars 3.95k forks source link

Use this model for Urdu language #634

Closed MalikMahnoor closed 4 years ago

MalikMahnoor commented 7 years ago

I wanted to use this model for urdu language .But I found this in FAQ '' DeepSpeech's requirements for the data is that the transcripts match the [a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

How can I design a neural network for speech transcription for languages like urdu ?

kdavis-mozilla commented 7 years ago

While we plan to target other languages, we haven't made any decision as to which is the next language to target yet. If you've sufficient speech data for Urdu, thousands of hours of speech, we'd be willing to help in modifying our code for Urdu and lending some server resources for training.

MalikMahnoor commented 7 years ago

Actually we are trying to make changes in spell.py and text.py for urdu language, and also working for language model in urdu.We have a corpus of urdu on which we will be doing our training.Is this the right approach ?

kdavis-mozilla commented 7 years ago

@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have?

MalikMahnoor commented 7 years ago

700 sentences along with their audios ..but we are using this just to make a prototype..we can even collect more dataset..if this corpus shows good results

Sent from my T-Mobile 4G LTE Device

-------- Original message --------
From: Kelly Davis
Date:07/11/2017 2:10 PM (GMT+05:00)
To: mozilla/DeepSpeech
Cc: MalikMahnoor ,Mention
Subject: Re: [mozilla/DeepSpeech] Use this model for Urdu language (#634)
@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
MalikMahnoor commented 7 years ago

And by the way your spell.py and text.py is working fine for urdu as well.We have made our language model ,changed the dataset to urdu..The code works fine till the creation of execution context..It gives error on training.The errors to our understanding are because of n_characters (which we have changed too to no of characters in urdu)but there are other errors too.

Sent from my T-Mobile 4G LTE Device

-------- Original message --------
From: Kelly Davis
Date:07/11/2017 2:10 PM (GMT+05:00)
To: mozilla/DeepSpeech
Cc: MalikMahnoor ,Mention
Subject: Re: [mozilla/DeepSpeech] Use this model for Urdu language (#634)
@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
kdavis-mozilla commented 7 years ago

Could you post the errors you're getting? Maybe we can help.

MalikMahnoor commented 7 years ago

We have managed to fix those errors..now it goes in to training..the code works fine now.. but only for isolated words not sentences .We are trying to fix text.py for that.Hopefully we ll be able to do that within a few days

Sent from Yahoo Mail on Android

On Thu, Jul 13, 2017 at 8:27 PM, Kelly Davisnotifications@github.com wrote:
Could you post the errors you're getting? Maybe we can help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kdavis-mozilla commented 7 years ago

Awesome!

MalikMahnoor commented 7 years ago

Thanks !

Sent from Yahoo Mail on Android

On Thu, Jul 13, 2017 at 8:35 PM, Kelly Davisnotifications@github.com wrote:
Awesome!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kdavis-mozilla commented 7 years ago

@MalikMahnoor When you get an Urdu model up and running and want to distribute it to the world, we'd be happy to help host the model for you. Providing, say, S3 storage so others can download the model.

abbasrazaali commented 6 years ago

Hi @MalikMahnoor I am also working on Urdu Speech Recognition but using a different approach. I have already tried single speaker 700 sentences corpus recorded by Agha Ali. It is not useful corpus and now planning to add data from new sources. We can collaborate. thanks

lissyx commented 6 years ago

@abbasrazaali @MalikMahnoor I would suggest you take a look also at Common Voice, they are working on localization and internationalization, that would help you augment the corpus.

sajjadsaleem commented 6 years ago

@kdavis-mozilla Are there any specific requirements for audio recordings you need? What if, we provide you, thousands hours of recordings of Urdu TV/radio. Please specify, if there are any such requirements. Can you please also explain, what type of code changes are needed for accomplishing Urdu support?

kdavis-mozilla commented 6 years ago

@sajjadsaleem I don't know if there are hard an fast requirements. However, there are some things which we have found to work.

As for supporting Urdu you'll need to make changes similar to those required for French support which is described here[1] or German described here[2].

sehargul commented 6 years ago

hi its me sehar gul deep speech is new for me i have to train it for urdu language can u help me how to train it for urdu language??

kdavis-mozilla commented 6 years ago

@sehargul A good start is the discourse post[1]; further discussion can be had there.

cmhashim commented 6 years ago

Any updates on the Urdu model?

Hafsa26 commented 5 years ago

Hi. I couldn't find spell.py file in DeepSpeech Master - Version 0.2.0 alpha 0. what could be the substitute of it ? Thank you!

Hafsa26 commented 5 years ago

@kdavis-mozilla, can you please answer my query? What could be substitute of spell.py file in Deepspeech master Version 0.2.0 alpha0. Thank you!

kdavis-mozilla commented 5 years ago

@Hafsa26 There have been a lot of changes since spell.py was in the repo. Could you say a little more about what you want to do?

Hafsa26 commented 5 years ago

@Hafsa26 There have been a lot of changes since spell.py was in the repo. Could you say a little more about what you want to do?

I am working on Urdu Language Speech Recognition system using DeepSpeech. As you said above, we need to make changes in text.py and spell.py for it. I found text.py in repo but couldn't find spell.py. So what could be the solution for it? Secondly, if you have any blog or help for speech recognition system of some other language using Deepspeech. Kindly please share. Thank you!

kdavis-mozilla commented 5 years ago

@Hafsa26 I guess I'm looking more towards: What your goal? spell.py is no longer in the repo, but the functionality it provided is. So, I need to know what functionality you are trying to use so I can point you in the right direction.

waqasr6 commented 5 years ago

@MalikMahnoor Dear what is the status of your work on Urdu language model ? can you share ?

waqasr6 commented 5 years ago

@kdavis-mozilla I want to create my own language model based on Urdu language. Can you please help me in this matter ? I've collected approximately 9000 audio recorded files in Urdu voice of 100 different sentences. Currently i am training this data with Roman transcription but i want to train it with Urdu transcription.

lissyx commented 5 years ago

@kdavis-mozilla I want to create my own language model based on Urdu language. Can you please help me in this matter ? I've collected approximately 9000 audio recorded files in Urdu voice of 100 different sentences. Currently i am training this data with Roman transcription but i want to train it with Urdu transcription.

What's wrong in the current documentation ? There should be everything documented for you to achieve that.

waqasr6 commented 5 years ago

@lissyx can you please elaborate which documentation you are talking about? or share that documentation here. As I've never found any for languages other than English

lissyx commented 5 years ago

What about README.md ? I really don't understand what's blocking you.

JRMeyer commented 5 years ago

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

lissyx commented 5 years ago

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

Maybe, but again, if we don't know the pain points, it's less efficient. If you ask me, it's trivial and all properly documented. Obviously it's not the case, and thus I'm unsure I can produce anything more useful than the existent documentation.

JRMeyer commented 5 years ago

I've been running into all the pain points getting DS to work with all the CV langs, so I definite could write up that post... I'm just concerned about how much time it would take - a week or so I'd guess.

carlfm01 commented 5 years ago

When I finish the Windows parts I'll start working on it for Spanish, @JRMeyer I can share with you the "hardest parts" if you want.

waqasr6 commented 5 years ago

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

it would be very helpful indeed.

waqasr6 commented 5 years ago

What about README.md ? I really don't understand what's blocking you.

I just need to know does DeepSpeech supports RTL transcription like Arabic and Urdu ?

kdavis-mozilla commented 5 years ago

@waqasr6 I know developers outside of Mozilla have used it for Urdu, but we at Mozilla have never used it for such.

lissyx commented 5 years ago

I just need to know does DeepSpeech supports RTL transcription like Arabic and Urdu ?

What kind of constraints do you have in mind ? We have support for UTF-8 so chars should be handled properly, and then RTL should not be a problem since this is how training will be done

waqasr6 commented 5 years ago

@lissyx Thanks. Many things in my mind are cleared now. I'll try it with Urdu language model now.

Hafsa26 commented 5 years ago

@lissyx Hi, How to convert output_graph.pb model into .pbmm model ? I got my Urdu language model with .pb extension. Is there any way to convert into .pbmm ?

Thank you!

lissyx commented 5 years ago

@Hafsa26 Have you read README.md ?

Hafsa26 commented 5 years ago

I did. to check the model, I need output_graph.pbmm but I got output_graph.pb Do I need to make some changes to get .pbmm graph rather than .pb graph.

kdavis-mozilla commented 5 years ago

I think what lissyx is referring to is this.

lissyx commented 5 years ago

https://index.taskcluster.net/v1/task/project.deepspeech.tensorflow.pip.r1.12.cpu/artifacts/public/convert_graphdef_memmapped_format https://index.taskcluster.net/v1/task/project.deepspeech.tensorflow.pip.r1.12.osx/artifacts/public/convert_graphdef_memmapped_format

Hafsa26 commented 5 years ago

Thank you so much!

lissyx commented 5 years ago

Do you mind sharing figures on how well your model performs? You also might want to export it to tflite format for Android support.

Hafsa26 commented 5 years ago

@lissyx yes, I would surely share soon. Up till now, I worked on 1 hour of data and the system is working fine. Though, I am getting 100% WER yet but I will tweak the model once I started working on 300 hours data. I initially have to prepare demo of DeepSpeech for Urdu Language.

Hafsa26 commented 5 years ago

If there is anything you can share to make it better, I would love to know.

Hafsa26 commented 5 years ago

I am not planning to use it on Android yet but I need, I will surely do it. Thank you for helping all the way.

lissyx commented 5 years ago

image

Please avoid images

Hafsa26 commented 5 years ago

When I trained model for one hour, loss is gradually decreasing but after 14 epochs, its increasing for some epochs and decreasing for some epochs. What do you suggest in such scenario?

lissyx commented 5 years ago

When I trained model for one hour, loss is gradually decreasing but after 14 epochs, its increasing for some epochs and decreasing for some epochs. What do you suggest in such scenario?

Not surprising with only one hour, nothing to conclude. You will have to adjust hyper-parameters, eventually, anyway.

Hafsa26 commented 5 years ago

I will. I will be using 300 hours of data next then I will be adjusting hyper-parameters accordingly. Is there any guide for adjusting hyper-parameters?