mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.44k stars 3.98k forks source link

Add pretrained models for other languages #2468

Closed lu4p closed 5 years ago

solyarisoftware commented 5 years ago

I add myself to the @lu4p feature request.

To grow DeepSpeech (and Common voice on top) user base, a pretrained models for each of available langue could help!

Please supply updated pretrained models for all available languages.

Thanks giorgio

lissyx commented 5 years ago

Sourcing, training and evaluating other languages requires much more resources than we have.

lissyx commented 5 years ago

Also we provide everything to allow anyone to produce models, so there is nothing blocking any community from working on that.

lissyx commented 5 years ago

There would still be value in at least the thinking process of how we might refer to community contributed models.

solyarisoftware commented 5 years ago

@lissyx, Well I understand the "lack of resources" and ok, one can set-up his own model "recompiling all" from available common voice data (https://github.com/mozilla/DeepSpeech/blob/master/TRAINING.rst#training-your-own-model),

but having pre-trained models already available for each language foresee in Common voice data feed project:

  1. speed-up community contributions, especially by not-tech experts (e.g. to set-up live demo for next November sprint..., etc,etc.),

  2. Drop the newbie friction, allowing to enlarge quickly communities all around the world.

I do believe that having available a repo of pre-trained models for each language is a smart "business/communication" feature.

My two cents giorgio

dabinat commented 5 years ago

In order to provide speech technologies in Firefox, there would need to be models for multiple languages. What’s Mozilla’s plan for this? Is it going to be solely reliant on the community?

lissyx commented 5 years ago

I do believe that having available a repo of pre-trained models for each languages is a smart "business/communication" feature.

it is, but again: we don't have any of those pre-trained model yet.

speed-up dev of any contribution, especially for not-tech experts (e.g. live demo for next November sprint, isn'it?),

I am not sure I get your point here. What live demo are you referring about ? "Non-tech experts", I don't get it either, DeepSpeech still requires some level of expertise for integration in your app.

low the friction allowing to enlarge quickly communities all around the world.

That's why we are open to feedback / improvements. We are indeed helping the Italian community and they are making good progresses. I also know they can use your help, if you have spare cycles ...

In order to provide speech technologies in Firefox, there would need to be models for multiple languages. What’s Mozilla’s plan for this? Is it going to be solely reliant on the community?

It is still too early: this landed only behind a pref in Nightly. Sourcing good data is really the biggest issue, but without a clear goal, it's harder to motivate external contributions.

solyarisoftware commented 5 years ago

I am not sure I get your point here. What live demo are you referring about ? "Non-tech experts", I >don't get it either, DeepSpeech still requires some level of expertise for integration in your app.

well, I mean that having available pre-trained models helps developers that want to try deepspeech, to quickly test and benchmark, without the need of rebuild from scratch.

So my modest suggestion/desire is to have here published an "official" (and updated) repo of pre-trained models, for each language in a "stable" phase.

lissyx commented 5 years ago

So my modest suggestion/desire is to have here published an "official" (and updated) repo of pre-trained models, for each language in a "stable" phase.

Nobody to our knowledge is at that level.

Mte90 commented 5 years ago

This is something that in Mozilla Italia in our monthly calls and discussion was a common discussion. The point that seems not everyone has clear about generating a model is:

Now every language based on the amount requires hours of training, machines with hardware capable, and also will not use only the CV dataset for training. Every language use different datasets so this create a level of entropy gigantic.

This is a thing that the communities should do, like Italian or French because they can adapt based on the needs.

Also DS is still at 0.6 so it is not stable!

Common Voice is not DeepSpeech, CV is a way to gather data for DS so they shouldn't be associated at all when we are talking on that point.

The question that we should do is: We want a good software to generate models and a good dataset or we want focus only to get the model?

Right now the status of both the project is not so stable and strong for the second point, at least until one language reach the goal of 2000 hours of recording.

solyarisoftware commented 5 years ago

Hi Daniele.

Common Voice has like 25 language enabled 75 languages to unlock

I guess that's perfectly clear for everybody here.

Now every language based on the amount requires hours of training, machines with hardware >capable, and also will not use only the CV dataset for training. Every language use different datasets

It doesn't add up, can you clarify that point? In thought each pretrained DeepSpeech dataset (as the English one already availabe in this repo) should be built just from English language Common Voice dataset. That's untrue?

I'm asking because, if each pre-trained language model is built not only using Common Voice training set, but it's also enhanced/mixed with external-sets, that lead to a collateral issue immo, in terms of replicability, transparency, bias, etc.

BTW I see a pre-trained model as a foundation data-set, a common reference (someone say "universal"). Afterward the huge value of this opendata+opensource project is that developer could enhances the reference (CV) data-set with his own dataset on-top, producing his own custom model. Right?

The question that we should do is: We want a good software to generate models and a good >dataset or we want focus only to get the model?

I don't see any trade-off; both are required. Let's stay on focus of DeepSpeech success: we need best datasets (exactly in terms of Common Voice main statements: inclusivity, real spoken language training-mapping, etc.) to achieve best speech recognition success rate. At the end of the day, the power of CV+DS together is to have opensource+opendata enabling a complete open platform for citizens.

So, in my opinion, this issue should remain open, as CHANGE REQUEST remind, just because that's strategic.

We need to supply developers simple tools to adopt DeepSpeech platform (and all on behind, mainly CV). To avoid any friction, to lower the learning curve, and to allow non English languages to be considered, sooner or later (a stable release), as English language, with equal in dignity ;-)

So I renew my proposal to maintain pre-trained language models, immediately availables, in this core github repo, or in a separated language-local repo (does not change much).

giorgio

lissyx commented 5 years ago

To avoid any friction, to lower the learning curve, and to allow non English languages to be considered, sooner or later (a stable release), as English language, with equal in dignity ;-)

Please stop your insinuations. If you have 10k hours of Italian, please, share. We'll make a good model asap.

solyarisoftware commented 5 years ago

@lissyx, you didn't reply on my points. "equal in dignity" is not an insinuation, is just a requirement stated in common voice goals I share.

kdavis-mozilla commented 5 years ago

We are working on releasing non-English models as soon as we get training data sets that are of a sufficient size to train with.

As this conversation is getting rather heated, isn't facilitating the process of getting non-English training data sets of a sufficient size, and is robbing us of time needed to get those data sets, I'm going to close the issue for now.