quanteda / spacyr

R wrapper to spaCy NLP
http://spacyr.quanteda.io
251 stars 38 forks source link

Instructions for SSL issues? #248

Closed rjake closed 8 months ago

rjake commented 8 months ago

Big fan of spacyr. I have used it successfully on my personal computer, but I cannot get it to install at work due to SSL issues. I'm not great at python but I can run by adding in pip install pip-system-cert. Is there something equivalent I can do to get this to work with spacyr?

Python

python -m venv C:/Users/rjake/.virtualenvs/spacy
C:/Users/rjake/.virtualenvs/spacy/Scripts/activate
pip install -U pip setuptools wheel
pip install -U spacy
pip install pip-system-certs   # <---- this step fixes it
python -m spacy download en_core_web_sm

R

This is an abridged output highlighting the instances of [SSL: CERTIFICATE_VERIFY_FAILED] under the step .../python.exe -m spacy download en_core_web_sm

> spacyr::spacy_install()
Using virtual environment "r-spacyr" ...
+ "C:/Users/rjake/.virtualenvs/r-spacyr/Scripts/python.exe" -m pip install --upgrade --no-user "spacy"

Executing command:
C:/Users/rjake/.virtualenvs/r-spacyr/Scripts/python.exe -m spacy download en_core_web_sm

File "C:\Users\rjake\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: 
  self signed certificate in certificate chain (_ssl.c:1129)

File "C:\Users\rjake\VIRTUA~1\r-spacyr\lib\site-packages\urllib3\connectionpool.py", line 491, in _make_request
    raise new_e
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: 
  self signed certificate in certificate chain (_ssl.c:1129)

File "C:\Users\rjake\VIRTUA~1\r-spacyr\lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): 
  Max retries exceeded with url: /explosion/spacy-models/master/compatibility.json (
    Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: 
    self signed certificate in certificate chain (_ssl.c:1129)'))
  )

File "C:\Users\rjake\VIRTUA~1\r-spacyr\lib\site-packages\requests\adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): 
  Max retries exceeded with url: /explosion/spacy-models/master/compatibility.json (
    Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: 
    self signed certificate in certificate chain (_ssl.c:1129)'))
  )
Installation of spaCy version 3.7.4 complete.

I have tried going to https://github.com/explosion/spacy-models/releases/tag/en_core_web_sm-3.7.1 > en_core_web_sm-3.7.1.tar.gz then running:

untar(
  tarfile = "~/Downloads/en_core_web_sm-3.7.1.tar.gz", 
  exdir = "~/.virtualenvs/r-spacyr/Lib/site-packages"
)

but can't seem to get it to work. I'm curious if you could provide instructions on how to install things manually (via the tar.gz file) or how to add something similar to the pip install pip-system-cert step.

JBGruber commented 8 months ago

You can absolutly install packages into the same environment as spaCy! Since you have run the install command already, the virtual environment already exists. So you should be able to install the package with this:

reticulate::py_install(packages = "pip_system_certs", envname = "r-spacyr")

If for some reason this also doesn't work, you can use the pip executable of this specific environment directly in the terminal. To find it, run this:

reticulate::use_virtualenv("r-spacyr")
reticulate::py_exe()

For me, this returns /home/johannes/.virtualenvs/r-spacyr/bin/python. So I could use the specific pip version with (just replace python with pip):

/home/johannes/.virtualenvs/r-spacyr/bin/pip install pip-system-certs
rjake commented 8 months ago

Perfect, @JBGruber, I'm so glad I asked! Altogether it looks like this for me:

# run once only
spacyr::spacy_install()
reticulate::py_install(packages = "pip_system_certs", envname = "r-spacyr")
spacyr::spacy_download_langmodel(lang_models = "en_core_web_sm")

# subsequent runs
spacyr::spacy_initialize()

spacyr::spacy_parse("I wanted to show you my favorite tv show")
doc_id sentence_id token_id token lemma pos entity
text1 1 1 I I PRON
text1 1 2 wanted want VERB
text1 1 3 to to PART
text1 1 4 show show VERB
text1 1 5 you you PRON
text1 1 6 my my PRON
text1 1 7 favorite favorite ADJ
text1 1 8 tv tv NOUN
text1 1 9 show show NOUN