markuskiller / textblob-de

German language support for TextBlob.
https://textblob-de.readthedocs.org
MIT License
104 stars 12 forks source link

[question] how to use "python -m textblob.download_corpora" begind a proxy #16

Closed tarrade closed 6 years ago

tarrade commented 6 years ago

Hi there,

we are having issue when working behind the proxy when using: python -m textblob.download_corpora

(we can pass some information about the proxy when using pip)

I tried to find info on the web or in the code but couldn't find anything right now.

In python normally " requests is a good module but working with proxy is always trickty. requests.get('http:/.....', proxies=proxies)

Is there another like downloading the file manually and running some other command to install properly the necessary corpora

Thanks Cheers Fabien

tarrade commented 6 years ago

Hi Markus,

ok, the last commit was few years ago so I am not sure this package is still maintained. Anway it was nice stuff but probably we will switch to other libs.

Thanks Cheers Fabien

markuskiller commented 6 years ago

Hi tarrade Sorry for the late reply. I intend to keep the package functional but do have very limited resources at the moment. I'd recommend https://github.com/explosion/spaCy as the state-of-the-art python lib for most tasks textblob-de can be used for.

RE alternative corpora download possibilities: The necessary corpora can be downloaded manually from: http://www.nltk.org/nltk_data/ and need to be unzipped into the following directory tree structure in nltk_data directory (standard locations unless specified otherwise C:\nltk_data or C:\Users\username\AppData\Roaming\nltk_data (Windows), /usr/local/share/nltk_data (Mac), or /usr/share/nltk_data (Unix):

nltk_data/
    corpora/
        brown [= corpus id on download page]
        conll2000
        movie_reviews
        wordnet
    taggers/
       averaged_perceptron_tagger
    tokenizers/
        punkt

Thank you for your interest in textblob-de. Best wishes, Markus