mozilla-l10n / firefox-l10n

Localized messages for Firefox
Mozilla Public License 2.0
5 stars 6 forks source link

The bundled Vietnamese dictionaries of `vi` Firefox is outdated #23

Closed ngoclong19 closed 3 months ago

ngoclong19 commented 3 months ago

The vi Firefox includes two dictionaries for Vietnamese, based on two common approaches in placing diacritics. They are located here, and are last updated 11 years ago: https://github.com/mozilla-l10n/firefox-l10n/tree/main/vi/extensions/spellcheck/hunspell

According to this article (https://wiki.mozilla.org/L10n:Dictionaries#Vietnamese), the newest files can be found here (licensed under GPL v3): https://github.com/1ec5/hunspell-vi/tree/main/dictionaries There are many common words that was added over the years, so I would like the updated files to be included in the official build.

I can install an add-on to provide the update files, it is available at: https://addons.mozilla.org/firefox/addon/vietnamese-dictionary/ But it will create a confusion like this.

Screenshot of spellcheckers

There are now there entries for Vietnamese, two from the built-in dictionaries, one from the add-on. The add-on only provides the new style of accent marks. If it provides both styles, that would be a total of four entries.

I think this confusion is not good for inexperienced users. Can we update the files here to their newest version? I see that this repository doesn't accept pull request, and Pontoon doesn't handle these files neither.

flodolo commented 3 months ago

This type of request should be filed on Bugzilla (here for Vietnamese), so that other folks can chime in. Going forward, we'll try to make it clearer that issues should be filed only for technical aspects of the repo (e.g. automation), not localization.

Here's the bad news: that add-on is licensed under GPL, which is not compatible with in-tree dictionaries (see top of the wiki page you linked to). Those files were landed without the proper authorization (there is no bug reference), and needs to be removed.

Filed bug 1912392 to take care of that.

flodolo commented 3 months ago

Hit enter too quickly, adding a couple of thoughts.

First of all, thanks for taking the time to file the issue, it's great that you realized the built-in dictionaries were outdated and took the time to research.

Sadly, most locales cannot ship a built-in dictionary (including my own), because most of them are released with an incompatible license.

1ec5 commented 3 months ago

Hi, commenting here since bug 1912392 is closed. I’m the original author of the hunspell-vi dictionaries for Mozilla. I put it under GPL because the source of the wordlist, the Free Vietnamese Dictionary Project, is under GPL. It looks like Chromium also copied the same files into their codebase, though they don’t have the same trilicensing requirement.

I would be open to changing the license of the overall package as long as I have the ability to do so. At one point, the Vietnamese Wiktionary secured the FVDP’s author’s permission to import the full contents of those dictionaries – not just the wordlist but also all the definitions – into Wiktionary under the GFDL and later CC BY-SA. I have no idea if either license is legally compatible with MPL, but if it is, then I would be happy to simply relicense the dictionary files. Otherwise, I could try to reach out again to FVDP’s author to see if he’d be willing to change the license yet again.

There are some other sources of wordlists. Wiktionary also imported a database mapping Vietnamese quốc ngữ words to Nôm characters that came from the WinVNKey input method, also under the terms of the GFDL and later CC BY-SA. I’ll need to look into whether the quốc ngữ words in that database would form a reasonable wordlist, or whether it would have too much literary vocabulary and not enough practical vocabulary to serve Firefox and Thunderbird users. If it would be useful, I could ask around to get that database’s author’s permission…

flodolo commented 3 months ago

It looks like Chromium also copied the same files into their codebase, though they don’t have the same trilicensing requirement.

That doesn't seem right to me, (GPL is an "infectious" license, and Chromium is not GPL), but that's lawyer territory.

under the GFDL and later CC BY-SA. I have no idea if either license is legally compatible with MPL, but if it is, then I would be happy to simply relicense the dictionary files.

I can check about compatibility.

flodolo commented 2 months ago

Sorry, it took longer than expected. I checked with legal, and unfortunately both CC BY-SA and GFDL are not suitable license to ship within Firefox (the latter is not compatible with GPL either, apparently).

1ec5 commented 2 months ago

Thanks for checking. Is it safe to assume that Mozilla also considers a raw wordlist to be eligible for copyright in the first place? The Wikimedia Foundation’s stance is that the lemmata in a dictionary is factual data ineligible for copyright in the U.S., as opposed to the definitions or artistic layout. So it would be quite ironic if any risk were to arise from deriving a comprehensive wordlist from Wiktionary entry titles under its auspices, but I understand if Mozilla wishes to consider non-U.S. copyright protections as well. Ideally we could just extract lemmata from Wikidata’s lexicographical data, which is explicitly CC0, but coverage is still woefully inadequate so far.

flodolo commented 2 months ago

While it might be acceptable in practice, I don't think that makes us "good citizens" when it comes to respecting licenses.