statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

cleanNLP 3.0 failing on Windows 10 machine #60

Closed kaybenleroll closed 4 years ago

kaybenleroll commented 4 years ago

I am having trouble getting the new cleanNLP 3.0 working on a Windows 10 machine - there seems to be an issue getting it working with Anaconda (which I don't use very much)

The following code gives me errors:

library(cleanNLP)

cnlp_init_corenlp()

Error: The 'cleannlp' appears to be available on your system, however the reticulate package has selected an alternative version of Python to the one where you installed the module. Restart R and run:

library(cleanNLP)

prior to running any other code. If that still produces this error, restart R and manually select the version of Python before running any other functions with:

use_python("C:\Work\Programs\Anaconda3\python.exe")

Getting similar issues with the spacy backend - so perhaps a small vignette or install instructions for the best way to get cleanNLP working on Windows machines?

Thanks for your help - let me know if you need any more information.

> devtools::session_info()
- Session info ------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.6.1 (2019-07-05)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_Ireland.1252        
 ctype    English_Ireland.1252        
 tz       Europe/London               
 date     2019-10-26                  

- Packages ----------------------------------------------------------------------------------------------------------------------------------
 package     * version date       lib source        
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
 backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
 callr         3.3.2   2019-09-22 [1] CRAN (R 3.6.1)
 cleanNLP    * 3.0.0   2019-10-22 [1] CRAN (R 3.6.1)
 cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.1)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.1)
 devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.1)
 digest        0.6.22  2019-10-21 [1] CRAN (R 3.6.1)
 ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.1)
 fortunes      1.5-4   2016-12-29 [1] CRAN (R 3.6.0)
 fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.1)
 glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.1)
 jsonlite      1.6     2018-12-07 [1] CRAN (R 3.6.1)
 lattice       0.20-38 2018-11-04 [2] CRAN (R 3.6.1)
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.1)
 Matrix        1.2-17  2019-03-22 [2] CRAN (R 3.6.1)
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.1)
 pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.1)
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.1)
 prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.1)
 processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.1)
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.1)
 R6            2.4.0   2019-02-14 [1] CRAN (R 3.6.1)
 Rcpp          1.0.2   2019-07-25 [1] CRAN (R 3.6.1)
 remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.1)
 reticulate    1.13    2019-07-24 [1] CRAN (R 3.6.1)
 rlang         0.4.1   2019-10-24 [1] CRAN (R 3.6.1)
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)
 rstudioapi    0.10    2019-03-19 [1] CRAN (R 3.6.1)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
 testthat      2.2.1   2019-07-25 [1] CRAN (R 3.6.1)
 usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.1)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.1)

[1] C:/Work/Programs/R/library
[2] C:/Work/Programs/R/R-3.6.1/library
kaybenleroll commented 4 years ago

Sorry, just so I'm clear - I've tried doing the above as well as playing around with reticulate to see if I can get things to work - but no luck so far.

statsmaths commented 4 years ago

So it appears that the python module 'cleannlp' is installed somewhere, but that it keeps starting up an alternative version of Python than where it was installed (even when you ask it to). It will help to get some diagnostics from reticulate. Could you run these three lines in R and post the results?

reticulate::py_discover_config(required_module="cleannlp")
reticulate::py_config()
reticulate::import("cleannlp") # this should give an error, but it's helpful to check

One possible problem, though, is that the error message is a bit wrong for Windows because running this directly:

use_python("C:\Work\Programs\Anaconda3\python.exe")

Will probably give an error message. You need to escape the backslashes or convert them to forward slashes (I'll also tack on a "required" to make any warnings explicit):

use_python("C:\\Work\\Programs\\Anaconda3\\python.exe", required=TRUE)
use_python("C:/Work/Programs/Anaconda3/python.exe", required=TRUE)

Either of them should work. Once we figure out what's wrong with your setup, I will adjust the error messages in the package.

kaybenleroll commented 4 years ago

No problem at all. Here is the output:

> reticulate::py_discover_config(required_module="cleannlp")
python:         C:\Work\Programs\Anaconda3\python.exe
libpython:      C:/Work/Programs/Anaconda3/python37.dll
pythonhome:     C:\Work\Programs\Anaconda3
version:        3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:\Work\Programs\Anaconda3\lib\site-packages\numpy
numpy_version:  1.16.5
cleannlp:       C:\Work\Programs\Anaconda3\lib\site-packages\cleannlp\__init__.p

python versions found: 
 C:\Work\Programs\Anaconda3\envs\r-reticulate\python.exe
 C:\Work\Programs\Anaconda3\python.exe
 C:\Work\Programs\Anaconda3\envs\spacy_condaenv\python.exe

> reticulate::py_config()
python:         C:\Work\Programs\Anaconda3\python.exe
libpython:      C:/Work/Programs/Anaconda3/python37.dll
pythonhome:     C:\Work\Programs\Anaconda3
version:        3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:\Work\Programs\Anaconda3\lib\site-packages\numpy
numpy_version:  1.16.5

python versions found: 
 C:\Work\Programs\Anaconda3\envs\r-reticulate\python.exe
 C:\Work\Programs\Anaconda3\python.exe
 C:\Work\Programs\Anaconda3\envs\spacy_condaenv\python.exe

> reticulate::import("cleannlp")
Error in py_module_import(module, convert = convert) : 
  ModuleNotFoundError: No module named 'stanfordnlp'

I'm pretty sure I used the proper slashes, but I can certainly try again.

statsmaths commented 4 years ago

Thanks for the info, that was very helpful and I think I see the issue now. It isn't actually a Windows problem per say, but an error in my Python config script that doesn't list 'stanfordnlp' (the Python module) as a dependency. You should be able to fix that by manually installing the module:

pip install stanfordnlp

Or manually re-installing the fixed cleannlp module:

pip install -U cleannlp

Please let me know if that solves the issue.

kaybenleroll commented 4 years ago

I think I tried that but not sure.

Which virtualenv should they be installed into? Or should I just try to install them globally?

kaybenleroll commented 4 years ago

Okay, it looks like there were some issues getting PyTorch installed - it needed a version >= 1.0.0 but that was not available in the conda so I found a repo that gave me some commands to install it

I will post those here for future reference in case it is useful for docs or whatever:

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Unfortunately - I am also having issues with spacy - when I try to cnlp_download_spacy('en') I get the following error:

 Error in py_call_impl(callable, dots$args, dots$keywords) : SystemExit: 1 
3.
stop(structure(list(message = "SystemExit: 1", call = py_call_impl(callable, 
    dots$args, dots$keywords), cppstack = structure(list(file = "", 
    line = -1L, stack = "C++ stack not available on this system"), class = "Rcpp_stack_trace")), class = c("Rcpp::exception", 
"C++Error", "error", "condition"))) 
2.
spacy$cli$download(model_name) 
1.
cnlp_download_spacy("en") 

All of this worked fine on my linux machines BTW.

Finally, I wasn't sure if I was clear, but the python I am using on my machine is the latest Anaconda and I do not use python much so I haven't changed the config much.

statsmaths commented 4 years ago

Seems like my fix at least triggered the correct dependencies (!). Thanks fir the info regarding pytorch; I’ll list it in an FAQ on the Readme, and maybe find a workaround for people who only want spacy anyway.

Regarding the model download, that seems to be something amiss with spacy. Two ideas: try to explicitly download the small English model with `cnlp_download_spacy("en_core_web_sm")ˋ or follow the spacy instructions to directly install the model. See https://spacy.io/usage.

Thanks for your help debugging this! I would love to get this working on Windows but don’t have easy access to a machine to test it on.

kaybenleroll commented 4 years ago

No problem at all - I'll try that and get back to you.

Happy to help debug on Windows if you like - cleanNLP is insanely useful, so would like to contribute back.

I hear you regarding Windows - I do most of my work on linux machines in the cloud, but it is sometimes handy to run something in Windows too.

I'll try both and see how I get on.

kaybenleroll commented 4 years ago

I still haven't been able to get spacy to work on Windows for myself.

AT this point it is not particularly important for me as I do not use spacy much and can work with CoreNLP anyway, but I am happy to help you fix this issue if you think it is a bigger problem than just myself?

statsmaths commented 4 years ago

Thanks for the update and thanks for trying. No need to keep fixing on your end if you don't need spacy at the moment, though it would be helpful to know what happened when you install the English model directly from Python. Did it give an error in Python, or just still refused to load through R?

kaybenleroll commented 4 years ago

Doh! I just realised my last response contained no detail beyond 'it didn't work!' - sorry about that.

> cnlp_download_spacy("en_core_web_sm")
 Error in py_call_impl(callable, dots$args, dots$keywords) : SystemExit: 1 
3.
stop(structure(list(message = "SystemExit: 1", call = py_call_impl(callable, 
    dots$args, dots$keywords), cppstack = structure(list(file = "", 
    line = -1L, stack = "C++ stack not available on this system"), class = "Rcpp_stack_trace")), class = c("Rcpp::exception", 
"C++Error", "error", "condition"))) 
2.
spacy$cli$download(model_name) 
1.
cnlp_download_spacy("en_core_web_sm") 

Which is pretty much the same error as before.

I think tried to go via python on the command-line and that seems to be working - I tried

python -c "import spacy; spacy.prefer_gpu(); nlp = spacy.load('en_core_web_sm')"

This returned without an error.

Does it seem to be an issue with R and python and reticulate perhaps?

statsmaths commented 4 years ago

Ah great, thanks for the additional details. It's really nice to know that at least you were able to install the library directly. Just to clarify though, I didn't think that installing the library directly would fix the cnlp_download_spacyerror... I just thought it would let you run cnlp_init_spacy("en_core_web_sm") directly. If it's not too much to ask, if you could try now initialising the spacy backend and letting me know the error message (or if it works!) that should give me enough to go on. Thanks again!

kaybenleroll commented 4 years ago

Not at all - I'm happy to help you as much as I can! Like I said, this is hugely useful so it is good to contribute. :)

> library(cleanNLP)
> cnlp_init_spacy()
Error: model en not found; use cnlp_download_spacy("en") to install
> cnlp_download_spacy("en")
Error in py_call_impl(callable, dots$args, dots$keywords) : SystemExit: 1

Detailed traceback: 
  File "C:\Work\Programs\Anaconda3\lib\site-packages\spacy\cli\download.py", line 53, in download
    sys.exit(dl)
statsmaths commented 4 years ago

Perfect, thanks again!