Open djsutherland opened 6 years ago
The different resources in nltk_data
comes in different licenses. The licenses of the individual resources in nltk_data
should be safe for re-distribution.
It'll be great to package nltk_data
, would it be a pip-able data library?
It wouldn't be in pip
, but you could get it with conda install nltk_data
(assuming you've set up conda-forge: https://conda-forge.org).
I see now that the xml
files specify the licenses of the data files. I guess the question is what license the xml
files themselves have...they're so small that I doubt it really matters, but still not technically specified. Anyway, I guess we'll just say "License: Various" or whatever, still need to figure that out amongst ourselves though.
s in One of our NLP project is completely dependent on NLTK tokenizer and POS tagger. But recently we figured out that the tokenizer and POS tagger models do not have a license and hence we are not able to use them in our project. Is it possible to add a license for those two models? Is there any other models available in the net for tokenizer and POS tagger which is open source?
This remains a problem for distributions packaging nltk. Looking at https://www.nltk.org/nltk_data/, many of the fields have a blank licence/copyright field.
Would it be possible for nltk to construct a free/libre dataset which can be safely redistributed? Thanks.
Many of the NLTK data resources themselves contain licensing, copyright or README files that contain additional information on to what extent the data may be distributed. Perhaps that will help somewhat.
I did end up untarring the whole lot and taking a look but many of them had either no README (etc) or if they did have one, indicated they were proprietary.
For the record, I'm removing NLTK from Gentoo because of this. IANAL but it looks like many of the corpora shouldn't be redistributed as part of nltk_data in the first place, and letting NLTK download them puts users at risk of copyright violation.
Can you clarify what license the
nltk_data
files are under? Is it the same license asnltk
? Do the various data files have different licenses?conda-forge
would like to begin packagaingnltk_data
, because a few users have requested it (to make installing more uniform / track versioning / etc; https://github.com/conda-forge/staged-recipes/pull/4463), but we'd need to know the license first.