suyashb95 / WiktionaryParser

A Python Wiktionary Parser
MIT License
357 stars 92 forks source link

Supporting french as base language #76

Open cedric-audy opened 3 years ago

cedric-audy commented 3 years ago

Hi everyone,

As part of a school project where I needed a bunch of words in french and their definition (also in french), I forked the project and modified the code to my needs. The code is here : https://github.com/cedric-audy/WiktionaryParser .

Solved I was unable to pull from the repo and use wiktionary as a 'package'. For now, when I need it, I include the whole thing in my project and import WiktionaryParser, which is impractical. However, I had no problem installing the main version using pip. Help would be appreciated on that.

After a bit of tinkering I can now retrieve a definition and etymology (see image), with the help of this code : https://github.com/cedric-audy/french_wiktionary_scraper .

image

I am fairly new to all this (git, forking, python, etc etc), so help would be appreciated in making a version of WiktionaryParser that works with french as base language.

suyashb95 commented 3 years ago

@cedric-audy the project only supports the English wiktionary as of now since the parts of speech/page structure for each language would be different. Thanks for working on support for French on your fork of the repo! I'll try to integrate other languages in the project after taking a look at that

I was unable to pull from the repo and use wiktionary as a 'package'. For now, when I need it, I include the whole thing in my project and import WiktionaryParser, which is impractical. However, I had no problem installing the main version using pip. Help > would be appreciated on that.

Could you elaborate on this? Are you unable to use it from source?

cedric-audy commented 3 years ago

Solved

I was unable to pull from the repo and use wiktionary as a 'package'. For now, when I need it, I include the whole thing in my project and import WiktionaryParser, which is impractical. However, I had no problem installing the main version using pip. Help > would be appreciated on that.

Could you elaborate on this? Are you unable to use it from source?

cedric-audy commented 3 years ago

I've made some improvements today, we can now retrieve 'nom commun' (definition), 'étymologie', 'synonymes', 'dérivés' (related words?), 'vocabulaire apparenté par le sens' (sense related vocabulary?), 'hyperonymes' (synonyms, but more generic), 'hyponymes' (more specific synonyms, such as bleu d'auvergne for fromage). I still need to do pronunciations.

I really didnt have to change this many things. Maybe french could be integrated into the source code eventually.

Output exemple (using pprint) image

tbm commented 3 years ago

@cedric-audy also see PR #56

danieldjewell commented 3 years ago

I too am interested in other langauges - as @Suyash458 points out, one of the problems is that the actual response metadata from Wiktionary changes based on the language queried. (This is why, of course, @cedric-audy you had to change the definitions -- "etymologies" >> "étymologie", etc.) (Side note: I am very sorry to say that I am a very beginner student of French [my apologies @cedric-audy] - but I am helped that something like 40+% of English vocabulary comes from (Norman) French. That said, I think the proper translation for "determiner" would be déterminant and not "dérivés" -- I'll open an issue over on your repo @cedric-audy with more to keep it separate. EDIT: Can't do that, issues aren't enabled. @cedric-audy I think the proper translation of "parts of speech" into French would be (catégorie_lexicale)[https://fr.m.wiktionary.org/wiki/catégorie_lexicale] - I would double check some of the translations like the previous one I mentioned. )

I wasn't aware that the Wiktionary/Mediawiki APIs actually change the langauge of the metadata -- that really does complicate things...

I wonder if Wiktionary has a language mapping table already built for internationalization - e.g. something that will lookup the language-local equivalents for the API response structure. Will check that out.

cedric-audy commented 3 years ago

Hi @danieldjewell, thank you for your input, issues were already enabled from what I see, but I added some templates in case it needed that. You are right about the botched translation, I was mainly interested in making it work for a school project. Yes to all of your suggestions, but I dont have time this week for this I am afraid :)

gozat commented 2 years ago

Please see pull request #92 for a possible adaptation of the code.