simjanos-dev / LinguaCafe

LinguaCafe is a self-hosted software that helps language learners read foreign languages.
https://simjanos-dev.github.io/LinguaCafeHome/
GNU General Public License v3.0
804 stars 24 forks source link

Dictionaries: add original language dictionary - add definition from wiktionary in the original language #130

Open pm3003 opened 5 months ago

pm3003 commented 5 months ago

This is a feature request

When learners reach a certain level with a language, simple translations are not enough. Coincidentally, that's often when they start to read classical texts, and when LinguaCafe is most useful.

Original language dictionaries help understand finer nuances and slightly different significations of a word.

Request: Add original language dictionaries. I believe this can be most easily achieved by adding wiktionary dictionaries in the original language (German dictionary from wiktionary.de, Italian wiktionary from wiktionary.it, etc)with word and definition.

simjanos-dev commented 5 months ago

It is something that I also planned. Do you have a source for monolingual wiktionaries?

It will probably be added later, because there are some other things I want to prioritize first.

pm3003 commented 4 months ago

Thank you very much !

Here are some comments on dictionaries, I believe the easiest is Wiktionary :

Apart from wiktionaries (available in xml format , as raw dumps ), I used a few years ago a toolchain that included sdcv (Stardict command-line version). They had trouble at some point because they offered scraped copyrighted dictionaries, but they also have a handful of free dictionaries. https://github.com/Dushistov/sdcv https://github.com/huzheng001/stardict-3.

The GLAWI/ENGLAWI project has a free restructured version of Wiktionary for some languages.

The project dictmaster has a link to a relatively old list of free offline dictionaries, explicitely free, and not explicitely free (though for example the American Heritage Dictionary is public domain).

Project Gutenberg has free dictionaries in full-text format, that might be easibly parseable (See for example this Welsh-English dictionary). This person has done it with the Project Gutenberg's digitization of the 1913 Webster dictionary. (There's also a GNU version of it )

The Russian Website Lingvo has a lot of dictionaries in bgl/Goldendict format, but most of them are likely copyrighted.

Regarding two languages I know well, French and German:

simjanos-dev commented 4 months ago

Thank you so much for the detailed information and links!

I will start with the monolingual wiktionaries first in the next couple of updates. I haven't checked the details of xml yet, if the definitions are extractable easily, they should be very easy to add. I'll look at all the other ones in the future, and eventually add all of them.

I love dictionaries, the more the better.

simjanos-dev commented 4 months ago

word

It seems like it will be very difficult to parse it. XML files do not have a "meaning" field, so I'll have to try to parse the plain formatted text somehow.

simjanos-dev commented 3 months ago

@pm3003

Hi!

It is very difficult to parse the original XML files from wiktionary.

I found kaikki.org. It has monlingual wiktionaries in a format that can be easily imported, I will start with these. It has monolingual wiktionaries in these languages:

I plan to add more, and go through your sources, it will just take some time.

lef-est commented 2 weeks ago

Will you consider supporting the Yomichan format (JSON)? It will gain access to a wealth of dictionaries made by Yomichan users in many languages, not just Japanese. It can be bring-your-own-dictionary and you don't need to worry about copyright.