Closed goodmami closed 4 years ago
IIRC, WordNet concepts are not meant to have "languages", the language comes only at the lemma level. Synsets are considered to be technically "universal" and lemmas are realization of synsets in specific languages.
But I'm not good with the WordNet philosophy. Maybe @fcbond has a better idea.
My proposal was more for practical purposes. Currently, when creating a WordNet
object it loads the English data, even if the user never wanted to use English. Only when they try something like ss.lemmas(lang=...)
does the language get loaded. This seems inefficient to me.
If someone cares about using the universality to, e.g., lookup a synset from a wordnet in one language and then list lemmas in another language, it seems clearer to do something like this:
>>> pwn = WordNet(lang='eng')
>>> jwn = WordNet(lang='jpn')
>>> for ss in pwn.synsets('dog'):
... for lemma in jwn.synset(ss).lemmas():
... print(lemma)
But I, too, would like to get Francis's take here. @fcbond, care to comment?
In OMW 1.0, the structure comes entirely from PWN, so without the English wordnet, there are no semantic relations, and no synset nodes to attach the lemmas to, So we have to load English first.
On Mon, Apr 27, 2020 at 1:31 PM Michael Wayne Goodman < notifications@github.com> wrote:
My proposal was more for practical purposes. Currently, when creating a WordNet object it loads the English data, even if the user never wanted to use English. Only when they try something like ss.lemmas(lang=...) does the language get loaded. This seems inefficient to me.
If someone cares about using the universality to, e.g., lookup a synset from a wordnet in one language and then list lemmas in another language, it seems clearer to do something like this:
pwn = WordNet(lang='eng')>>> jwn = WordNet(lang='jpn')>>> for ss in pwn.synsets('dog'):... for lemma in jwn.synset(ss).lemmas():... print(lemma)
But I, too, would like to get Francis's take here. @fcbond https://github.com/fcbond, care to comment?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nltk/wordnet/issues/19#issuecomment-619731085, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRSGBYLPUYZ2V5YU7P3ROUKBDANCNFSM4JHV4DDA .
-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University
Ok that makes sense then. Thanks for explaining.
It seems like we could still use the API I proposed above anticipating a world where the shared concept structure is detached from the PWN. It would allow us to perform operations for other languages without having to specify lang
all the time. But it wouldn't gain anything in, e.g., space efficiency.
I'm closing this as it's handled by https://github.com/goodmami/wn
>>> import wn
>>> wn.words('chat') # returns both French and English
[Word('ewn-chat-n'), Word('ewn-chat-v'), Word('frawn-lex14803'), Word('frawn-lex21897')]
>>> ewn = wn.WordNet(lgcode='en')
>>> ewn.words('chat') # only returns English
[Word('ewn-chat-n'), Word('ewn-chat-v')]
As I understand, creating a WordNet object always loads the English data, and if you call a method with lang=xyz where xyz is not 'eng', it also loads the data for that language.
I wonder why it doesn't just make
lang
a parameter for the WordNet class, so it only loads the data for that language, then remove the parameter on any of its methods. This might also help to avoid someif lang='eng'
checks within the functions. Then it would just be a matter of instantiating a new WordNet object if one wants to work with multiple wordnets:(note: this example is illustrative; I cannot actually query for '犬' in Japanese because of #20 )
Of course, this would break some backward compatibility.