nltk / nltk_data

NLTK Data
1.43k stars 1.03k forks source link

murciélago , spanish for "bat" is not found in wordnet (omw) #151

Open JamesArthurHolland opened 3 years ago

JamesArthurHolland commented 3 years ago

grep murciélago 20274:02141611-n lemma murciélago ratonero 20285:02143142-n lemma murciélago trompudo mexicano

^ it exists in the omw folder

>>> print(wn.synsets("murciélago", lang="spa"))
[]
>>> print(wn.synsets("gato", lang="spa"))
[Synset('cat.n.01'), Synset('tom.n.02'), Synset('dodger.n.01')]

Cat is found but bat is not

fcbond commented 3 years ago

Hi,

please try using our new improved interface: https://github.com/goodmami/wn

On Mon, Mar 1, 2021 at 11:16 PM Jamie Holland notifications@github.com wrote:

grep murciélago 20274:02141611-n lemma murciélago ratonero 20285:02143142-n lemma murciélago trompudo mexicano

^ it exists in the omw folder

print(wn.synsets("murciélago", lang="spa")) [] print(wn.synsets("gato", lang="spa")) [Synset('cat.n.01'), Synset('tom.n.02'), Synset('dodger.n.01')]

Bat is found but cat is not

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nltk/nltk_data/issues/151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRREPZTU55OZZQAXDJ3TBOVVFANCNFSM4YMXDCPA .

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

ekaf commented 2 years ago

@JamesArthurHolland the OMW lines that you quote don't mean that "murciélago" exists as a single word in OMW-1.4, but that it exists as a part of two compounds:

from nltk.corpus import wordnet as wn print(wn.synsets("murciélago_ratonero", lang="spa"))

[Synset('mouse-eared_bat.n.01')]

print(wn.synsets("murciélago_trompudo_mexicano", lang="spa"))

[Synset('hognose_bat.n.01')]

print(wn.synsets("bat")[0].lemmas(lang="spa"))

[Lemma('bat.n.01.chiroptera')]

However, the word "murciélago" exists in the Spanish wordnet released by MCR in 2016, so it could help if OMW caught up with that data.