nltk / nltk_data

NLTK Data
1.4k stars 1.03k forks source link

nltk 2.3.4 is not working with the zipped version of wordnet #190

Closed lipsa-vlad closed 2 years ago

lipsa-vlad commented 2 years ago

Even after running running nltk.download() I get the following error from nltk. I checked the nltk_data folder, the wordnet package is zipped. If I unzip it everything works.

`File \"/usr/local/lib/python3.8/site-packages/nltk/stem/wordnet.py\", line 40, in lemmatize lemmas = wordnet._morphy(word, pos) File \"/usr/local/lib/python3.8/site-packages/nltk/corpus/util.py\", line 116, in getattr self.load() File \"/usr/local/lib/python3.8/site-packages/nltk/corpus/util.py\", line 81, in load except LookupError: raise e File \"/usr/local/lib/python3.8/site-packages/nltk/corpus/util.py\", line 78, in load root = nltk.data.find('{}/{}'.format(self.subdir, self.name)) File \"/usr/local/lib/python3.8/site-packages/nltk/data.py\", line 653, in find raise LookupError(resource_not_found) LookupError:


Resource 'corpora/wordnet' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in:

ekaf commented 2 years ago

@lipsa-vlad, yes, I guess this behaviour can be expected, because there are limits to how much compatibility you can maintain with a version as old as NLTK 2.3.4. The problem could be that the wordnet library of that time probably lacked support for zipped data. For users who cannot upgrade to a more recent NLTK, it seems ok to just unzip the data, as you did.

lipsa-vlad commented 2 years ago

thanks, I guess I'll have to upgrade then.

ekaf commented 2 years ago

It would be better if we could detect that you run an old NLTK version, and provide a more informative error message, or even better, just unzip the package automatically. But old NLTK versions are frozen, and stay as they are, so they cannot be improved. The only possible alternative would be to just unzip everything by default, which is not ideal neither. So yes, the recommended action is to upgrade to the current NLTK version, which is probably not difficult, since you are already running Python 3.8.

tomaarsen commented 2 years ago

The difficulty is that as @ekaf mentioned, old NLTK versions are frozen, but for nltk_data we only expose the most recent version. This means that old NLTK versions can stop working as nltk_data is updated and changed. This is a consequence of how we host the nltk_data.

lipsa-vlad commented 2 years ago

Indeed, not much to do from your side if the old version is frozen.