nltk / wordnet

Stand-alone WordNet API
Other
48 stars 15 forks source link

License for WordNet dataset #30

Open xcao0907 opened 3 years ago

xcao0907 commented 3 years ago

While the NLTK project is under Apache license there's no specific license information for individual dataset under NLTK (http://www.nltk.org/nltk_data/) such as WordNet, can you please add a license for WordNet and WordNet-InfoContent which specify the application of these dataset?

goodmami commented 3 years ago

This repository includes the data under the wn/data/ subdirectory (that is, it doesn't download from nltk_data like the NLTK library). If your question is regarding the nltk_data repository, then maybe you can raise an issue there? For the data included in this repository, the license files are in their respective subdirectories. For example, see here for the license of the Dutch wordnet.

The WordNet-InfoContent case is more interesting. It is a copy of the data distributed by Ted Pedersen to go along with the WordNet::Similarity Perl module (see here). While the Perl module itself is GPL v2, the data is distributed separately, and it doesn't have any license listed that I could find. The data doesn't get the GPL simply because it was produced by GPL code (source), and even if the data were distributed in a subdirectory of a GPL project, it's not clear that the data would get the license automatically (source). So I'm not really sure under which license the IC data is, if any.