wit-ai / wit

Natural Language Interface for apps and devices
https://wit.ai/
931 stars 91 forks source link

Updated: speech recognition supported languages #1562

Closed solyarisoftware closed 4 years ago

solyarisoftware commented 4 years ago

Hi, just a question I can't solve reading documentation:

I'm a happy user of speech recognition APIs that I used so far to convert Italian language speech to text.

https://wit.ai/docs/http/20170307#post__speech_link https://wit.ai/docs/http/20170307#context_link

Here https://wit.ai/faq you say:

We currently support Afrikaans, Akan, Albanian, Amharic, Arabic, Armenian, Assamese, Aymara, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Cherokee, Chichewa, Chinese, Croatian, Czech, Danish, Divehi, Dutch, English, Esperanto, Estonian, Faroese, Finnish, French, Fulah, Galician, Ganda, Georgian, German, Greek, Guarani, Gujarati, Haitian, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo, Indonesian, Inuktitut, Irish, Italian, Japanese, Javanese, Kalaallisut, Kannada, Kashmiri, Kazakh, Khmer, Kinyarwanda, Kirghiz, Korean, Kurdish, Lao, Latin, Latvian, Limburgish, Lingala, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Northern Sami, Norwegian, Norwegian (bokmal), Oriya, Oromo, Panjabi, Pashto, Persian, Polish, Portuguese, Quechua, Raeto-Romance, Romanian, Russian, Sanskrit, Sardinian, Scottish Gaelic, Serbian, Shona, Sicilian, Sindhi, Sinhalese, Slovak, Slovenian, Somali, Sotho, South Ndebele, Spanish, Sundanese, Swahili, Swati, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Tigrinya, Tsonga, Tswana, Turkish, Uighur, Ukrainian, Urdu, Uzbek, Venda, Vietnamese, Welsh, Western Frisian, Wolof, Xhosa, Yiddish, Yoruba, and Zulu

The list of 131 languages in a single column:

Afrikaans, Akan, Albanian, Amharic, Arabic, Armenian, Assamese, Aymara, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Cherokee, Chichewa, Chinese, Croatian, Czech, Danish, Divehi, Dutch, English, Esperanto, Estonian, Faroese, Finnish, French, Fulah, Galician, Ganda, Georgian, German, Greek, Guarani, Gujarati, Haitian, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo, Indonesian, Inuktitut, Irish, Italian, Japanese, Javanese, Kalaallisut, Kannada, Kashmiri, Kazakh, Khmer, Kinyarwanda, Kirghiz, Korean, Kurdish, Lao, Latin, Latvian, Limburgish, Lingala, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Northern Sami, Norwegian, Norwegian (bokmal), Oriya, Oromo, Panjabi, Pashto, Persian, Polish, Portuguese, Quechua, Raeto-Romance, Romanian, Russian, Sanskrit, Sardinian, Scottish Gaelic, Serbian, Shona, Sicilian, Sindhi, Sinhalese, Slovak, Slovenian, Somali, Sotho, South Ndebele, Spanish, Sundanese, Swahili, Swati, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Tigrinya, Tsonga, Tswana, Turkish, Uighur, Ukrainian, Urdu, Uzbek, Venda, Vietnamese, Welsh, Western Frisian, Wolof, Xhosa, Yiddish, Yoruba, Zulu

I understood I have to preset the locale attribute of context object to let the ASR translate in the desired language:

Locale of the user. The first 2 letters must be a valid ISO639-1 language, followed by an underscore, followed by a valid ISO3166 alpha2 country code. locale is used to resolve the entities powered by our open-source linguistic parser, Duckling (e.g. wit/datetime, wit/amount_of_money). If you have locale-specific needs for dates and times, please contribute directly to Duckling. If a locale is not yet available in Duckling, it will default to the “parent” language, with no locale-specific customization. Example: "en_GB".

QUESTIONS:

  1. Availability Can I use /speech API with ALL the above-listed (131) languages? Or ASR is available for just a short list subset?

  2. Nice to have request: Can you maybe supply (in the documentation), the basic/general ISO639-1 _ ISO3166 codes(languageCode_countryCode), for the 131 languages (or the available for ASR subset)?

By Example:

English, en_US French, fr_FR ...

Thanks giorgio

jtliao commented 4 years ago

Hi @solyarisoftware, if you scroll down a bit more in the FAQ, there's the list of supported languages for speech. Unfortunately we do not have support for specific locales, so the language of the app is used as a source of truth.

solyarisoftware commented 4 years ago

Ah! my fault,

we do not have support for specific locales, so the language of the app is used as a source of truth.

That's not clear to me. What do you mean with "the language of the app" ?

thanks giorgio

What languages do you support speech recognition for? We currently support speech recognition for Arabic, Bengali, Burmese, Catalan, Chinese, Dutch, English, Finnish, French, German, Hindi, Indonesian, Italian, Japanese, Kannada, Korean, Malay, Malayalam, Marathi, Polish, Portuguese, Russian, Sinhalese, Spanish, Swedish, Tagalog, Tamil, Thai, Turkish, Urdu, and Vietnamese. If you can't find yours when creating your app, don't hesitate to contact us!