themoeway / yomitan

Pop-up dictionary browser extension. Successor to Yomichan.
https://yomitan.wiki
GNU General Public License v3.0
1.2k stars 94 forks source link

Add Lingua Libre as default audio source #1093

Closed bicolino34 closed 3 months ago

bicolino34 commented 3 months ago

Lingua Libre is a language recording (recently also videos for sign languages) website created by Wikimedia France. Recording made with it are available under one of three free licences selected by the recorder (CC0 1.0 Public Domain Dedication; Creative Commons Attribution ShareAlike 4.0; Creative Commons Attribution 4.0) and are mostly used in Wiktionaries and other Wikimedia projects.

As of 20 June they have 1,258,620 recording available in a variety of languages. Three biggest are:

Full stats are available here https://lingualibre.org/wiki/LinguaLibre:Stats/Languages

Considering that Yomitan starts to support more and more languages, it would be great if we also provided an audio source for them. Though in the current state, Lingua Libre still doesn't cover a lot of words for many languages. But anyone can contribute their recordings if they know the language, so both projects might benefit from the integration.

StefanVukovic99 commented 3 months ago

I was browsing through that page just the other day (when Forvo turned on CloudFlare and none of the anki addons could access it). I was thinking of going the #324 route with Wiktionary's audio files.

Do you happen to know more about how much overlap there is between the audio datasets of LinguaLibre and Wiktionary (e.g. I can't seem to find the two audio files for german Fuchs on LinguaLibre)?

hugolpz commented 3 months ago

Hi, I'm the most active editor/coordinator and top-2 github owner for Lingualibre.org. You may see our stats, search your language, and download .zip per language from there : https://lingualibre.org/LanguagesGallery

Lingualibre is the main source of audios for Wiktionaries, we built Lingualibre to speed up and standardize's audio recording for e-dictionaries and e-learning apps.

If you want yourself record some language, I can point your toward the relevant resources. Lingualibre allows 380 recording per 72mins. But with upgraded user rights, one experienced user can record +800 words / hour, quickly satisfying the needs for a given dictionary.

As for wiktionary and German audio, we just have 16k words / 18k audios. There is a German user who already did the job the old way for wiktionary, see:

bicolino34 commented 3 months ago

@StefanVukovic99

Unfortunately, I do not know how much they overlap.

The two audio files that are missing were not made in Lingua Libre. That's probably the reason.

Before the Lingua Libre you had to record words in some other software and then upload it to wikicommons. Lingua Libre simplifies this process and adds some neat features for word recording process, like getting all the words from the certain category of Wiktionary.

Both the recording made externally and through Lingua Libre that are used on Wiktionaries are hosted on Wikicommons and should all be under specific category. For example, all the German recordings of words should be either in category German pronunciation or one of its subcategories. Maybe, this can be of any use

emanuelps2708 commented 3 months ago

This sounds great, it appears that there is a "small" amount of recordings for some languages, I tested it and is very easy to use and record, if there's a way to add this to Yomitan I compromise to improve it (I can help with Spanish, I'm from Colombia)

hugolpz commented 3 months ago

@emanuelps2708 hello, Lingualibre's recording studio is an online app so iframe embedding is always possible. A Django / Vuejs v.3.0 is under coding this summer to get rid of the host Wiki, this new version will be even easier to embed. We expect deployment this Fall.

hugolpz commented 3 months ago

I installed yomitan and tested it : it's impressive. Few feedback for Lingualibre integration :

To serve the audio files, you could :

One could rapidly record japanese tho. There are a few lists for Japanese List:Jpn/* of common words already available on lingualibre. A bit less for Korean List:Kor/* and Chinese List:Cmn/*.

EDIT: https://jsfiddle.net/7q0tx9ru/ Something like that.

Related to https://github.com/lingua-libre/signit/issues/66