Closed stevenbird closed 8 years ago
There is an Italian project called 'MultiWordNet' so I would like to avoid just 'multiwordnet'. How about omw?
OK. We're often writing "from nltk import wordnet as wn", and so wn has gained some currency as an abbreviation for WordNet.
We could have omwn. But in a world where openness is the unmarked case, we could have mwn.
Do either of these appeal or would you still prefer omw?
G'day,
OK. We're often writing "from nltk import wordnet as wn", and so wn has
gained some currency as an abbreviation for WordNet.
We could have omwn. But in a world where openness is the unmarked case, we could have mwn.
Do either of these appeal or would you still prefer omw?
I alos like to thing of openness as the default, but 'mwn' is still a bit close to Multiwordnet. I guess omwn is ok, although I have a slight preference for 'omw'. 'wngrid' is another possibility: this is the name chosen by the global wordnet association, and we are now the current implementation.
Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University
OK, omw it is then, thanks.
The list of languages in the supplied omw corpus is as follows. I think fre
is spurious (a copy of fra
) and we seem to be missing ind
even though it is mentioned in the documentation.
als cmn eng fin fre ita mcr nor por
arb dan fas fra heb jpn msa pol tha
@fcbond would you please advise.
The current list is as follows:
langs = ("eng", "ind", "zsm", "jpn", "tha", "cmn", "qcn", "fas", "arb", "heb", "ita", "por", "nob", "nno", "dan", "swe", "fra", "fin", "ell", "glg", "cat", "spa", "eus", "als", "pol", "slv")
We use qcn for traditional Chinese (and the slightly differently designed NTU, Taiwan Chinese Wordnet).
We will try to upload a new omw.zip sometime today.
t = dd(lambda: dd(unicode))
t['eng']['eng'] = 'English' t['eng']['ind'] = 'Inggeris' t['eng']['zsm'] = 'Inggeris' t['ind']['eng'] = 'Indonesian' t['ind']['ind'] = 'Bahasa Indonesia' t['ind']['zsm'] = 'Bahasa Indonesia' t['zsm']['eng'] = 'Malaysian' t['zsm']['ind'] = 'Bahasa Malaysia' t['zsm']['zsm'] = 'Bahasa Malaysia' t['msa']['eng'] = 'Malay'
t["swe"]["eng"] = "Swedish"; t["ell"]["eng"] = "Greek"; t["cmn"]["eng"] = "Chinese (simplified)"; t["qcn"]["eng"] = "Chinese (traditional)"; t['eng']['cmn'] = u'英语' t['cmn']['cmn'] = u'汉语' t['qcn']['cmn'] = u'漢語' t['cmn']['qcn'] = u'汉语' t['qcn']['qcn'] = u'漢語' t['jpn']['cmn'] = u'日语' t['jpn']['qcn'] = u'日语'
t['als']['eng'] = 'Albanian' t['arb']['eng'] = 'Arabic' t['cat']['eng'] = 'Catalan' t['dan']['eng'] = 'Danish' t['eus']['eng'] = 'Basque' t['fas']['eng'] = 'Farsi' t['fin']['eng'] = 'Finnish' t['fra']['eng'] = 'French' t['glg']['eng'] = 'Galician' t['heb']['eng'] = 'Hebrew' t['ita']['eng'] = 'Italian' t['jpn']['eng'] = 'Japanese' t['mkd']['eng'] = 'Macedonian' t['nno']['eng'] = 'Nynorsk' t['nob']['eng'] = u'Bokmål' t['pol']['eng'] = 'Polish' t['por']['eng'] = 'Portuguese' t['slv']['eng'] = 'Slovene' t['spa']['eng'] = 'Spanish' t['tha']['eng'] = 'Thai'
Hi, got the same problem that somebody posted on Quora some months ago: "I can call: from nltk.corpus import sinica_treebank
but when i call from nltk.corpus import omw The result is: cannot import name omw No module named omw. "
I checked the downloader and the omw is installed. I am using Python 2.7. Other modules work fine. Any clues? Thanks in advance.
One just needed to read the NLTK cookbook more accurately. You don't need to import the module 'omw', but you can recall it directly by simply importing wordnet (wn). More under: http://www.nltk.org/howto/wordnet.html
A user reported missing spanish lemmas from OMW: http://stackoverflow.com/questions/26474731/missing-spanish-wordnet-from-nltk/26494099#26494099
@franquattri It would be useful if the howto showed full installation instructions. On Ubuntu 14.04, with the data URL fixed (http://askubuntu.com/a/527408/93794), I have wordnet and omw installed (I see them under ~/nltk_data/corpora), but when I follow through http://www.nltk.org/howto/wordnet.html a lot of the examples fail, in particular wn.langs() fails with "AttributeError: 'WordNetCorpusReader' object has no attribute 'langs'". Is that manual for a specific version?
Hi Darren, The manual has been updated to the NLTK 3.0 version but it should work fine with the previous NLTK versions too. I'm working with Windows, Python 2.7 and iPython (which I suggest also for Unicode matters) Both attempts work for me:
from nltk.corpus import wordnet as wn
wn.langs() and
from nltk.corpus import wordnet as wn sorted(wn.langs()) # as showed here http://www.nltk.org/howto/wordnet.html
Can you be more specific about the examples that fail?
@DarrenCook, there are discrepancies between the API, the documentation and the nltk_data
but i'm sure the OMW team will fix it and the documentation will follow shortly.
Please note that catalan
seem to be missing from the wn.langs()
although it's in the MCR.
>>> import nltk
>>> nltk.__version__
'3.0.0'
>>> nltk.download('omw')
[nltk_data] Downloading package omw to /home/alvas/nltk_data...
[nltk_data] Package omw is already up-to-date!
True
>>> from nltk.corpus import wordnet as wn
>>> wn.langs()
[u'als', u'arb', u'cmn', u'dan', u'eng', u'fas', u'fin', u'fra', u'fre', u'heb', u'ita', u'jpn', u'cat', u'eus', u'glg', u'spa', u'ind', u'zsm', u'nno', u'nob', u'pol', u'por', u'tha']
>>> exit()
alvas@ubi:~$ cd ~/nltk_data/corpora/omw/
alvas@ubi:~/nltk_data/corpora/omw$ ls
als cmn eng fin fre ita mcr nor por tha
arb dan fas fra heb jpn msa pol README
alvas@ubi:~/nltk_data/corpora/omw$ cd mcr/
alvas@ubi:~/nltk_data/corpora/omw/mcr$ ls
LICENSE wn-data-cat.tab wn-data-glg.tab wn-data-spa.tab.gz
mcr2tab.py wn-data-eus.tab wn-data-spa.tab
nltk.version '2.0b9'
Is that too old?
(apt-get install python-nltk tells me "python-nltk is already the newest version.")
Working through the examples, the first one that fails is "print(wn.synset('dog.n.01').definition())", which says "TypeError: 'str' object is not callable". The three commands before that worked fine.
Using pip install -U nltk
would update to 3.0.0. apt-get
is still holding the older version.
With regards to accessing synsets from the wordnet API in NLTK, i think the major change would be https://github.com/nltk/nltk/commit/ba8ab7e23ea2b8d61029484098fd62d5986acd9c
Possibly you'll find errors from nltk.download()
too, if you're using the apt-get
branch of NLTK, see http://askubuntu.com/questions/527388/python-nltk-on-ubuntu-12-04-lts-nltk-downloadbrown-results-in-html-error-40
See also: Change Log: https://github.com/nltk/nltk/blob/develop/ChangeLog API Changes: https://github.com/nltk/nltk/wiki/Porting-your-code-to-NLTK-3.0
@DarrenCook you sure you have installed NLTK correctly? you can take a look here: http://www.nltk.org/install.html
To find out which nltk version you have: import nltk nltk.version
to update NLTK / modules (for windows) > Command Prompt > python -m pip install -upgrade SomePackage
Are you using the WN version that comes with NLTK (WN 3.0) or the newest release (i.e.have you imported it in NLTK)? There might be some issues for that reason as well.
Thanks @alvations and Francesca for your help. These two commands got everything working:
sudo apt-get install python-pip sudo pip install -U nltk
@franquattri I think I may have downloaded the latest wordnet, while having the 2.0b9 of nltk installed, so maybe that was the issue.
Hi, does Anybody know of multilingual framenets (apart from the English FrameNet) that can be searched with nltk?
This is already done, doesn't it?
Thanks @bryant1410. Yes, this is resolved.
i download cow from http://globalwordnet.org/wordnets-in-the-world/ to process Chinese. How can i use cow in python? for example, from nltk.corpus import wordnet as wn then how can i use cow?
cow is already included in omw (open multilingual wordnet), so if you download that from the normal download interface, you can access cow with lang='cmn': e.g. for Japanese wn.synsets('dog')[0].lemmas(lang='jpn') [Lemma('dog.n.01.イヌ'), Lemma('dog.n.01.ドッグ'), Lemma('dog.n.01.洋犬'), Lemma('dog.n.01.犬')]
On Thu, Apr 13, 2017 at 9:24 AM, nicoleljc1227 notifications@github.com wrote:
i download cow from http://globalwordnet.org/wordnets-in-the-world/ to process Chinese. How can i use cow in python? for example, from nltk.corpus import wordnet as wn then how can i use cow?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nltk/nltk_data/issues/9#issuecomment-293894095, or mute the thread https://github.com/notifications/unsubscribe-auth/ABD8xvdE1LZQNWx7VvZpiW5VZ6aToWJrks5rviH6gaJpZM4BJSpt .
-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University
Can we use wn.synsets('dog')[0].lemmas(lang='jpn') in a way of using more than one language, ie wn.synsets('dog')[0].lemmas(lang='jpn, ita')?
@francisbond is contributing the Open Multilingual Wordnet to NLTK (http://www.casta-net.jp/~kuribayashi/multi/).
We need to settle on a short name to use: multiwordnet?