Closed CabbagesGH closed 4 years ago
The code that is being triggered is trying to make sure that the Python module you have is at the same version as the R package you downloaded. It seems that you're version of the Python model is sufficiently out of sync that it doesn't even have a list version. I should modify the error message, but either way the solution is the same. Just re-install cleannlp (the Python module) with:
pip install --upgrade cleannlp
And you should be all set after restarting R.
Here is the terminal output, I'm fairly sure it's up to date:
Requirement already up-to-date: cleannlp in /usr/local/lib/python3.7/site-packages (1.0.3) Requirement already satisfied, skipping upgrade: spacy in /usr/local/lib/python3.7/site-packages (from cleannlp) (2.2.4) Requirement already satisfied, skipping upgrade: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (0.9.6) Requirement already satisfied, skipping upgrade: thinc==7.4.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (7.4.0) Requirement already satisfied, skipping upgrade: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (0.4.1) Requirement already satisfied, skipping upgrade: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (0.6.0) Requirement already satisfied, skipping upgrade: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (3.0.2) Requirement already satisfied, skipping upgrade: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (4.45.0) Requirement already satisfied, skipping upgrade: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (1.0.2) Requirement already satisfied, skipping upgrade: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (1.0.2) Requirement already satisfied, skipping upgrade: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (1.0.0) Requirement already satisfied, skipping upgrade: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (2.0.2) Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (46.1.3) Requirement already satisfied, skipping upgrade: numpy>=1.15.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (1.17.2) Requirement already satisfied, skipping upgrade: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (2.22.0) Requirement already satisfied, skipping upgrade: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy->cleannlp) (1.6.0) Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy->cleannlp) (2.8) Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy->cleannlp) (2019.6.16) Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy->cleannlp) (3.0.4) Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy->cleannlp) (1.25.3) Requirement already satisfied, skipping upgrade: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy->cleannlp) (3.1.0)
This has no effect and the error persists. Is version 1.0.3 not correct? The R package version is 3.0.2.
Version 1.0.3 is correct, and should show a version string. It is possible that reticulate is finding a different Python library location. Could you post the output of the following:
library(cleanNLP)
library(reticulate)
cleannlp <- reticulate::import("cleannlp")
names(cleannlp)
py_config()
names(cleannlp)
[1] "absolute_import" "corenlp" "environ" "spacy"
py_config()
python: /Users/Shared/.rvirtualenvs/topic/bin/python
libpython: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/config-3.7m-darwin/libpython3.7.dylib
pythonhome: /Library/Frameworks/Python.framework/Versions/3.7:/Library/Frameworks/Python.framework/Versions/3.7
version: 3.7.4 (v3.7.4:e09359112e, Jul 8 2019, 14:54:52) [Clang 6.0 (clang-600.0.57)]
numpy: /Users/Shared/.rvirtualenvs/topic/lib/python3.7/site-packages/numpy
numpy_version: 1.17.4
cleannlp: /Users/Shared/.rvirtualenvs/topic/lib/python3.7/site-packages/cleannlp
NOTE: Python version was forced by RETICULATE_PYTHON
Apologies for the formatting, I'm not sure if there is a better way to paste in console code in the comment editor.
Yes, so the problem is that R is finding a different version of the Python library than the one where you upgraded cleannlp. From your note above, cleannlp 1.0.3 is here:
/usr/local/lib/python3.7/site-packages
But reticulate is being set to locate packages in a different location, namely:
/Users/Shared/.rvirtualenvs/topic/lib/python3.7/site-packages/
It seems from the NOTE in your output that you've set RETICULATE_PYTHON for some reason. I think you could suggest the version of python that you want by running this:
Sys.setenv(RETICULATE_PYTHON = "/usr/local/bin/python3")
library(cleanNLP)
library(reticulate)
Note that this will not work unless you restart R and make sure that it is the first thing you run. Reticulate cannot change the version of Python that it is using without restarting the R session.
Alternatively, you could try to install cleannlp into the version of Python that is finding (you appear to have created a virtual environment within a hidden shared folder for some reason?!).
Ah this was very helpful, I didn't realise I needed to update the packages in the venv. I think it was installed this way because spacy recommends installing to a venv to 'avoid modifying system state'. Got a new problem now. A colleague that is now gone wrote the R script I'm trying to troubleshoot, but it seems it was written for a deprecated version of cleanNLP for R. The 'cnlp_get_token()' function doesn't seem to exist anymore. Seems like the whole thing may need to be rewritten due to this. Sigh...
Okay, glad that solved the first problem. And the removal of cnlp_get_token
should only be a minor change. You just grab the tokens with the dollar sign operator instead of a function, so this:
anno <- cnlp_annotate(input)
token <- cnlp_get_token(anno)
Just becomes this:
anno <- cnlp_annotate(input)
token <- anno$token
I see, I'll have to go through and see where other amends like this will be necessary to make the script compatible again.
I'll close this as my initial issue was solved.
Thank you for your help!
Loading the default spacy backend using
cnlp_init_spacy()
ends up with a version error:Error in py_get_attr_impl(x, name, silent) : AttributeError: module 'cleannlp' has no attribute 'VERSION'
The version attribute seems to just be missing and I'm not sure if this is because of something I've done, or if it just a general issue. If I manually import the module using
cleannlp <- reticulate::import("cleannlp")
and look for 'cleannlp$VERSION' definitely doesn't seem to be there.Let me know if there's any more information I can provide to be more helpful.