nltk / nltk_data

NLTK Data
1.45k stars 1.04k forks source link

Issue with downloading inaugural corpus #173

Closed pratos closed 2 years ago

pratos commented 2 years ago

Hi,

[x] Searched Stackoverflow for any existing issues [x] Searched nltk_data open and closed issues

I tried to install inaugural corpus using python -m nltk.downloader inaugural. But faced this problem:

[nltk_data] Downloading package inaugural to
[nltk_data]     /Users/prthamesh/nltk_data...
[nltk_data]   Unzipping corpora/inaugural.zip.
[nltk_data] [Errno 21] Is a directory:
[nltk_data]     '/Users/prthamesh/nltk_data/corpora/1789-Washington.tx
[nltk_data]     t'
Error installing package. Retry? [n/y/e]
y
[nltk_data] Downloading package inaugural to
[nltk_data]     /Users/prthamesh/nltk_data...
[nltk_data]   Unzipping corpora/inaugural.zip.
[nltk_data] [Errno 21] Is a directory:
[nltk_data]     '/Users/prthamesh/nltk_data/corpora/1789-Washington.tx
[nltk_data]     t'
Error installing package. Retry? [n/y/e]
y
[nltk_data] Downloading package inaugural to
[nltk_data]     /Users/prthamesh/nltk_data...
[nltk_data]   Unzipping corpora/inaugural.zip.
[nltk_data] [Errno 21] Is a directory:
[nltk_data]     '/Users/prthamesh/nltk_data/corpora/1789-Washington.tx
[nltk_data]     t'
Error installing package. Retry? [n/y/e]

This was tested on Mac M1 (2021 edition) and also on Ubuntu 20.04 (Github CI runner). Faced the above issue on both the OS.

tomaarsen commented 2 years ago

@pratos Hey! I investigated this a little bit. We updated our inaugural corpus about 8 hours ago. The changes were of a slightly different format than before, but I don't have issues on Windows. However, on Google Colab I do get these issues. They were (at least partially) resolved by updating nltk: pip install -U nltk. Perhaps this would work in your case. Let us know.

@stevenbird @nimbusaeta The recent changes to inaugural have some changes which might also be related:

If simply updating nltk doesn't help, then we might want to revert back (assuming the old version did work!).

pratos commented 2 years ago

Hey thanks for the update, will check out if bumping nltk version works for my local.

For our application though, we are being cautious not to break things. We resorted to removing inuagural from the list of corporas since we don't use it specifically now (just a bloat).

pratos commented 2 years ago

I can confirm that bumping nltk to 3.6.5 works on Mac M1

pratos commented 2 years ago

Closing this issue since this would affect folks only on the previous versions. We have nltk==3.2.4 for our legacy app. Incase if anyone gets this issue, just upgrade the nltk version