statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

Version missing from cleanNLP module #75

Closed CabbagesGH closed 4 years ago

CabbagesGH commented 4 years ago

Loading the default spacy backend using cnlp_init_spacy()ends up with a version error:

Error in py_get_attr_impl(x, name, silent) : AttributeError: module 'cleannlp' has no attribute 'VERSION'

The version attribute seems to just be missing and I'm not sure if this is because of something I've done, or if it just a general issue. If I manually import the module using cleannlp <- reticulate::import("cleannlp") and look for 'cleannlp$VERSION' definitely doesn't seem to be there.

Let me know if there's any more information I can provide to be more helpful.

statsmaths commented 4 years ago

The code that is being triggered is trying to make sure that the Python module you have is at the same version as the R package you downloaded. It seems that you're version of the Python model is sufficiently out of sync that it doesn't even have a list version. I should modify the error message, but either way the solution is the same. Just re-install cleannlp (the Python module) with:

pip install --upgrade cleannlp

And you should be all set after restarting R.

CabbagesGH commented 4 years ago

Here is the terminal output, I'm fairly sure it's up to date:

Requirement already up-to-date: cleannlp in /usr/local/lib/python3.7/site-packages (1.0.3) Requirement already satisfied, skipping upgrade: spacy in /usr/local/lib/python3.7/site-packages (from cleannlp) (2.2.4) Requirement already satisfied, skipping upgrade: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (0.9.6) Requirement already satisfied, skipping upgrade: thinc==7.4.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (7.4.0) Requirement already satisfied, skipping upgrade: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (0.4.1) Requirement already satisfied, skipping upgrade: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (0.6.0) Requirement already satisfied, skipping upgrade: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (3.0.2) Requirement already satisfied, skipping upgrade: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (4.45.0) Requirement already satisfied, skipping upgrade: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (1.0.2) Requirement already satisfied, skipping upgrade: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (1.0.2) Requirement already satisfied, skipping upgrade: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (1.0.0) Requirement already satisfied, skipping upgrade: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (2.0.2) Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (46.1.3) Requirement already satisfied, skipping upgrade: numpy>=1.15.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (1.17.2) Requirement already satisfied, skipping upgrade: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/site-packages (from spacy->cleannlp) (2.22.0) Requirement already satisfied, skipping upgrade: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy->cleannlp) (1.6.0) Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy->cleannlp) (2.8) Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy->cleannlp) (2019.6.16) Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy->cleannlp) (3.0.4) Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy->cleannlp) (1.25.3) Requirement already satisfied, skipping upgrade: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy->cleannlp) (3.1.0)

This has no effect and the error persists. Is version 1.0.3 not correct? The R package version is 3.0.2.

statsmaths commented 4 years ago

Version 1.0.3 is correct, and should show a version string. It is possible that reticulate is finding a different Python library location. Could you post the output of the following:

library(cleanNLP)
library(reticulate)
cleannlp <- reticulate::import("cleannlp")
names(cleannlp)
py_config()
CabbagesGH commented 4 years ago

names(cleannlp) [1] "absolute_import" "corenlp" "environ" "spacy" py_config() python: /Users/Shared/.rvirtualenvs/topic/bin/python libpython: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/config-3.7m-darwin/libpython3.7.dylib pythonhome: /Library/Frameworks/Python.framework/Versions/3.7:/Library/Frameworks/Python.framework/Versions/3.7 version: 3.7.4 (v3.7.4:e09359112e, Jul 8 2019, 14:54:52) [Clang 6.0 (clang-600.0.57)] numpy: /Users/Shared/.rvirtualenvs/topic/lib/python3.7/site-packages/numpy numpy_version: 1.17.4 cleannlp: /Users/Shared/.rvirtualenvs/topic/lib/python3.7/site-packages/cleannlp NOTE: Python version was forced by RETICULATE_PYTHON

Apologies for the formatting, I'm not sure if there is a better way to paste in console code in the comment editor.

statsmaths commented 4 years ago

Yes, so the problem is that R is finding a different version of the Python library than the one where you upgraded cleannlp. From your note above, cleannlp 1.0.3 is here:

/usr/local/lib/python3.7/site-packages 

But reticulate is being set to locate packages in a different location, namely:

/Users/Shared/.rvirtualenvs/topic/lib/python3.7/site-packages/

It seems from the NOTE in your output that you've set RETICULATE_PYTHON for some reason. I think you could suggest the version of python that you want by running this:

Sys.setenv(RETICULATE_PYTHON = "/usr/local/bin/python3")
library(cleanNLP)
library(reticulate)

Note that this will not work unless you restart R and make sure that it is the first thing you run. Reticulate cannot change the version of Python that it is using without restarting the R session.

Alternatively, you could try to install cleannlp into the version of Python that is finding (you appear to have created a virtual environment within a hidden shared folder for some reason?!).

CabbagesGH commented 4 years ago

Ah this was very helpful, I didn't realise I needed to update the packages in the venv. I think it was installed this way because spacy recommends installing to a venv to 'avoid modifying system state'. Got a new problem now. A colleague that is now gone wrote the R script I'm trying to troubleshoot, but it seems it was written for a deprecated version of cleanNLP for R. The 'cnlp_get_token()' function doesn't seem to exist anymore. Seems like the whole thing may need to be rewritten due to this. Sigh...

statsmaths commented 4 years ago

Okay, glad that solved the first problem. And the removal of cnlp_get_token should only be a minor change. You just grab the tokens with the dollar sign operator instead of a function, so this:

anno <- cnlp_annotate(input)
token <- cnlp_get_token(anno)

Just becomes this:

anno <- cnlp_annotate(input)
token <- anno$token
CabbagesGH commented 4 years ago

I see, I'll have to go through and see where other amends like this will be necessary to make the script compatible again.

I'll close this as my initial issue was solved.

Thank you for your help!