statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

Problems using corenlp and spacy #80

Closed emotionalsvm closed 2 years ago

emotionalsvm commented 3 years ago

Just installed cleanNLP but I am having issues using corenlp and spacy. To the best of my knowledge, I have installed the appropriate modules and dependencies. Could anyone please help?

library(cleanNLP)
library(reticulate)
Sys.setenv(RETICULATE_PYTHON = 'C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate')

I was able to detect cleannlp and stanza. I also have stanfordnlp installed.

reticulate::py_discover_config(required_module = "cleannlp")
reticulate::py_discover_config(required_module = "stanza")
python:         C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate/python.exe
libpython:      C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate/python36.dll
pythonhome:     C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate
version:        3.6.12 |Anaconda, Inc.| (default, Sep  9 2020, 00:29:25) [MSC v.1916 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate/Lib/site-packages/numpy
numpy_version:  1.19.5
cleannlp:       C:\Users\cloud\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\cleannlp\__init__.p

NOTE: Python version was forced by RETICULATE_PYTHON
python:         C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate/python.exe
libpython:      C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate/python36.dll
pythonhome:     C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate
version:        3.6.12 |Anaconda, Inc.| (default, Sep  9 2020, 00:29:25) [MSC v.1916 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:/Users/cloud/AppData/Local/r-miniconda/envs/r-reticulate/Lib/site-packages/numpy
numpy_version:  1.19.5
stanza:         C:\Users\cloud\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\stanza\__init__.p

NOTE: Python version was forced by RETICULATE_PYTHON

Initialising corenlp works without any errors.

cnlp_init_corenlp(lang = 'en', models_dir = NULL, config = NULL)

However, when I try to annotate the un data provided in cleanNLP, I get an error.

annotated <- cnlp_annotate(un)
Error in py_call_impl(callable, dots$args, dots$keywords) : RuntimeError: index_select(): Expected dtype int64 for index

As for spacy, I get an error trying to initialise it.

cnlp_init_spacy()
Error in py_call_impl(callable, dots$args, dots$keywords) : TypeError: load() got an unexpected keyword argument 'max_length'
davidfuhry commented 3 years ago

I've ran into the same issue using spacy. There seems to be a change to the spacy api in version 3, which was released recently, that is not reflected in cleanNLP yet. As a workaround you may go back to the last major version of spacy. To do that install spacy in a fresh environment using:

pip install 'spacy<3.0.0'

If you want to go back in an existing python environment use --force-reinstall but I'd advise against it, as it may cause issues with dependencies. Hope this helps.