nltk / nltk_data

NLTK Data
1.4k stars 1.03k forks source link

Update WordNet data files to 3.1 #18

Closed stevenbird closed 5 years ago

stevenbird commented 9 years ago

WordNet 3.1 provides updated data files in the same format as 3.0, plus a host of additional files. However, the lexnames file is gone.

fcbond commented 8 years ago

3.1 is an improvement, but moving to 3.1 would break the mapping to the multilingual data. We (the global wordnet association) are working on a solution to this (the Collaborative Interlingual Index) and plan to have it ready before the global wordnet conference in January 2016). So I suggest we hold off until then.

cgravier commented 8 years ago

Hello, Did you have any success in integrating 3.1 in nltk please ? (not like I was waiting for Feb. 2016 to post this comment, it just happens that I was looking for it today and ended up here :))

fcbond commented 8 years ago

G'day,

On Tue, Feb 9, 2016 at 5:52 AM, cgravier notifications@github.com wrote:

Hello, Did you have any success in integrating 3.1 in nltk please ? (not like I was waiting for Feb. 2016 to post this comment, it just happens that I was looking for it today and ended up here :)

We had a very successful workshop on this last week. As a result we will be rolling out a new version of the open multilingual wordnet that keeps the other languages in sync with English. I don't want to update just English first as (i) it would cause problems with linking to other languages and (ii) it is not a major change. When the new version of the Interlingual Index is ready (4-8 weeks) we will update everything together (and makes some changes to the interface code, as PWN is switching over to a new database format).

In summary: will fix in a little while.

If you are interested in the workshop, you can see some information here: http://compling.hss.ntu.edu.sg/events/2016-wn-gwg/

Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

cgravier commented 8 years ago

Thanks you for the double good news (will fix and new version with all contributed languages in synch !).

The workshop seems indeed interesting. Provided your are the webpage maintainer (?) please be aware of some broken links (mainly links pointing to http://compling.hss.ntu.edu.sg/events/2016-wn-gwg/pdf/ )

RanAR90 commented 6 years ago

Hi All.

can we now upgrade nltk to use wordnet 3.1?

Regards

fcbond commented 6 years ago

No, as that would break the links to all the other languages. We are working on a new version (slowly but now steadily) that will have the latest versions of everything and more.

On Thu, Feb 8, 2018 at 9:28 PM, RaniemAR notifications@github.com wrote:

Hi All.

can we now upgrade nltk to use wordnet 3.1?

Regards

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nltk/nltk_data/issues/18#issuecomment-364111128, or mute the thread https://github.com/notifications/unsubscribe-auth/ABD8xqu8hqaF-p63DO4Mx0ER80l5tVJpks5tSvaIgaJpZM4DIR7R .

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

krasi0 commented 4 years ago

@fcbond , @stevenbird it seems like after some years the upgrade to 3.1 has gone nowhere? How about adding the 3.1 corpus in a separate directory for users that do not need the Inter-lingual Index? P.S. a step-by-step guide on how one could do that locally in their environment would be appreciated, too. Thanks!

goodmami commented 2 years ago

@krasi0 (and others coming across this issue years later) I don't think there are any plans now to upgrade the NLTK's WordNet offering to 3.1. Also, if you want a more up-to-date wordnet, I recommend skipping 3.1 and going to a recent release of the Open English WordNet—a fork of the Princeton WordNet with active development. The OEWN is encoded in a format that allows it to maintain interlingual linkages (necessary for the Open Multilingual Wordnet) that are stable across releases. You can also get versions of the Princeton WordNet 3.0 and 3.1 in this format, if you're targeting a specific version.

For a Python library to read these wordnets, see Wn. It was written to follow the NLTK interface as long as it makes sense to do so. (disclaimer: I'm the author)