nltk / nltk_data

NLTK Data
1.45k stars 1.04k forks source link

nltk_data compatibility with Windows #140

Closed benhuff closed 4 years ago

benhuff commented 5 years ago

There is an issue that I am running into while downloading nltk_data through conda-forge on Windows. The installation is failing on a file con.xml inside the propbank corpus:

Downloading and Extracting Packages
nltk_data-2019.07.04 | 428.2 MB  | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: failed

CondaVerificationError: The package for nltk_data located at C:\Users\####\.conda\pkgs\nltk_data-2019.07.04-0
appears to be corrupted. The path 'lib/nltk_data/corpora/propbank/frames/con.xml'
specified in the package manifest cannot be found.

I believe this is happening because con is a reserved word on Windows. Unzipping the propbank.zip folder manually using 7zip automatically renames this file to _con.xml.

I was curious if this file could be renamed in this repo:nltk_data/corpora/propbank/frames/_con.xml or if it is preferred to solve this issue https://github.com/conda-forge/nltk_data-feedstock/issues/1#issue-344204759 specifically for the https://github.com/conda-forge/nltk_data-feedstock repository?