nltk / nltk_data

NLTK Data
1.4k stars 1.03k forks source link

I am not able to download punkt.zip file for tokenization purpose #202

Open sumitsharmatops opened 9 months ago

sumitsharmatops commented 9 months ago

Hi, I am working on one NLP project where I am using NLTK, Previously I was downloading punkt via api (nltk.download('punket')) but not want to download this manually but both things are not working. That mean I am not able to download this manually or via API, How to do that. Please help me out for this

tomaarsen commented 9 months ago

Hello! There is a CDN issue with the "Jio" internet provider, which prevents it from accessing the NLTK data, e.g.: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.xml. There are some workaround described here: https://github.com/nltk/nltk/issues/3146

The primary workaround that helped people was temporarily using mobile hotspot.

DaveParr commented 8 months ago

I wasn't either, though in my case I was able to easily download it on the same machine fro the same network with no configuration change.

I looked in the index file and found the url, copied it into my browser then moved the file to the relevant place for NLTK to find it.

Obviously this is manual. but if it suits your use case, it may work.

dvnasutosh commented 7 months ago

The only solution I found is cloning the whole repository. This is what I have done today for my project. It doesn't solve the problem but I hope it gives you a workaround.

dvnasutosh commented 7 months ago

image

Or you could go to https://github.dev/nltk/nltk_data and download it from there. this seems to be a much better solution.