quanteda / spacyr

R wrapper to spaCy NLP
http://spacyr.quanteda.io
245 stars 38 forks source link

Error in spacy_initialize(): no spaCy environment found #246

Open gabrielparriaux opened 6 months ago

gabrielparriaux commented 6 months ago

On MacOS, I’m using a conda environment which has spaCy installed and running. It’s been installed through miniconda and I installed the necessary packages into it: spaCy, rust. I installed a language model for French: fr_dep_news_trf.

When I activate the environment in the MacOS terminal, it’s working without problem. I can run a test like this one:

import spacy
nlp = spacy.load("fr_dep_news_trf")
doc = nlp("Je vous présente cette phrase intéressante en français.")
print([(w.text, w.pos_) for w in doc])

and everything works perfectly.

In my R code in Rstudio, when I try to initialise my environment with this command:

spacy_initialize(model = "fr_dep_news_trf", condaenv = "condaenvforspacy")

I get the following error message:

Error in spacy_initialize(model = "fr_dep_news_trf", condaenv = "condaenvforspacy") : 
  No spaCy environment found. Use `spacy_install()` to get started.
In addition: Warning message:
In spacy_initialize(model = "fr_dep_news_trf", condaenv = "condaenvforspacy") :
  Note that we have deprecated a number of parameters to simplify this function

Apparently, it doesn't find my conda environment…

One month before, the exact same code was running perfectly on my machine. I didn’t make any change to my code, but it’s possible that some updates happened in the middle (MacOS, R packages, Rstudio…).

Do you have any idea how I can make spacy_initialize() recognise my conda environment?

Thanks a lot for helping,

Gabriel


Here is the output of sessionInfo():

R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.2.1

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] webshot2_0.1.1 RColorBrewer_1.1-3 collapse_2.0.7 data.table_1.14.10 stopwords_2.3 spacyr_1.3.0 udpipe_0.8.11
[8] shiny_1.8.0 reticulate_1.34.0 readtext_0.90 quanteda.textplots_0.94.3 quanteda.textmodels_0.9.6 quanteda.textstats_0.96.4 rainette_0.3.1.1
[15] svglite_2.1.3 explor_0.3.10 questionr_0.7.8 ggrepel_0.9.4 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[22] dplyr_1.1.4 purrr_1.0.2 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 tidyverse_2.0.0 Hmisc_5.1-1
[29] gridExtra_2.3 gplots_3.1.3 corrplot_0.92 factoextra_1.0.7 ggplot2_3.4.4 FactoMineR_2.9 quanteda_3.3.1

loaded via a namespace (and not attached): [1] jsonlite_1.8.8 rstudioapi_0.15.0 shape_1.4.6 magrittr_2.0.3 estimability_1.4.1 rmarkdown_2.25 vctrs_0.6.5 base64enc_0.1-3 rstatix_0.7.2
[10] htmltools_0.5.7 haven_2.5.4 broom_1.0.5 Formula_1.2-5 KernSmooth_2.23-22 htmlwidgets_1.6.4 emmeans_1.9.0 mime_0.12 lifecycle_1.0.4
[19] iterators_1.0.14 pkgconfig_2.0.3 Matrix_1.6-4 R6_2.5.1 fastmap_1.1.1 digest_0.6.33 colorspace_2.1-0 ps_1.7.5 ellipse_0.5.0
[28] ggpubr_0.6.0 progressr_0.14.0 fansi_1.0.6 timechange_0.2.0 httr_1.4.7 abind_1.4-5 compiler_4.3.2 scatterD3_1.0.1 withr_2.5.2
[37] htmlTable_2.4.2 backports_1.4.1 carData_3.0-5 viridis_0.6.4 dendextend_1.17.1 highr_0.10 ggsignif_0.6.4 LiblineaR_2.10-23 MASS_7.3-60
[46] scatterplot3d_0.3-44 gtools_3.9.5 caTools_1.18.2 flashClust_1.01-2 chromote_0.1.2 tools_4.3.2 foreign_0.8-86 httpuv_1.6.13 nnet_7.3-19
[55] glue_1.6.2 promises_1.2.1 gridtext_0.1.5 grid_4.3.2 checkmate_2.3.1 cluster_2.1.6 generics_0.1.3 gtable_0.3.4 labelled_2.12.0
[64] tzdb_0.4.0 websocket_1.4.1 hms_1.1.3 xml2_1.3.6 car_3.1-2 utf8_1.2.4 foreach_1.5.2 pillar_1.9.0 nsyllable_1.0.1
[73] later_1.3.2 splines_4.3.2 lattice_0.22-5 survival_3.5-7 SparseM_1.81 tidyselect_1.2.0 miniUI_0.1.1.1 knitr_1.45 xfun_0.41
[82] DT_0.31 stringi_1.8.3 evaluate_0.23 codetools_0.2-19 ggwordcloud_0.6.1 multcompView_0.1-9 cli_3.6.2 RcppParallel_5.1.7 rpart_4.1.23
[91] xtable_1.8-4 systemfonts_1.0.5 processx_3.8.3 munsell_0.5.0 Rcpp_1.0.11 coda_0.19-4 png_0.1-8 parallel_4.3.2 leaps_3.1
[100] ellipsis_0.3.2 bitops_1.0-7 glmnet_4.1-8 viridisLite_0.4.2 mvtnorm_1.2-4 scales_1.3.0 rlang_1.1.2 fastmatch_1.1-4 formatR_1.14

amatsuo commented 6 months ago

Hello,

The issue is the method you are trying to use is based on the older version of spacyr. In version 1.3.0, we updated the installation method using the standard way of installing a virtual/conda environment using reticulate. In the current version of spacyr, spacy_install creates a self-contained virtual/conda environment for calling spaCy from R. Because of that, spacy_initialize() does not evaluate the option of spacy_condaenv anymore. If you do not have a strong reason to use the already installed version of spaCy in your system, we would recommend creating a new virtual environment.

Otherwise, you can try setting an environment variable for a path to your existing virtual environment as is described in the documentation of spacy_install. The method is to simply run Sys.setenv(SPACY_PYTHON ="path/to/directory"), before calling spacy_initialize(). I haven't tried this method by myself and am curious whether this would work properly. If you try, please report the result.

gabrielparriaux commented 6 months ago

Hello @amatsuo,

Thanks a lot for your help and explanations.

I tried to define the path to the conda environment, as you propose, but it didn’t result.

Also, I’m not exactly sure to know the path to which folder should be given… is it the path to:

  1. The folder of the conda environment itself? Example: /Users/xxx/miniconda3/envs/my-condaenv
  2. The folder of the bin folder inside the conda environment? Example: /Users/xxx/miniconda3/envs/my-condaenv/bin
  3. The folder that contains all the potential conda environments? Example: /Users/xxx/miniconda3/envs
  4. Something else?

I tried those three possibilities and I get the same error message as desribed above when running spacy_initialize() after it…

I tried the other way round, using a python virtual environment created with reticulate and it works fine… so I can probably abandon the conda solution.

Best,

Gabriel

toni-cerise commented 5 months ago

Having the same issue, Python environment works perfectly well:

version R version 4.3.0 (2023-04-21) os macOS Monterey 12.7.3 system aarch64, darwin20 ui RStudio

spacyr 1.3.0 2023-12-08 [1] CRAN (R 4.3.1)

RETICULATE_PYTHON_FALLBACK /…/python3.11

python 3.11.7 hb885b13_0
spacy 3.7.2 py311h30ceab6_0

toni-cerise commented 4 months ago

Is someone maintaining the package and tackling the issues?

JBGruber commented 4 months ago

As said before, the best way to solve this issue is to use spacy_install() and install a fresh virtual environment.

In addition to what @amatsuo said: my main motivation for contributing to this package with a new installation method was to remove the conda dependency. I had issues with conda whenever I tried it on Windows and Linux and find Python itself is much more stable nowadays. So I believe it is currently not possible to use a conda environment with the 1.3.0 version of the package, but I didn't get around to testing it yet.

I also still don't really see a reason why anyone would want to keep the old spaCy installation around, which at this point must be ancient. If you really insist on still using your old conda environment with the old version of spaCy, you can downgrade the R package with remotes::install_version("spacyr", "1.2.1") (the overhaul of the installation is really all that changed from then as far as I can see). Please let me know what your reasons are, I'm happy to be proven wrong and will add support for conda back.

I will try to set up and old environment with that version of spaCy and see if the new package version can still somehow connect to it.

EsoterikIR commented 2 months ago

I'm having the same problem. The main reason I need to use the Conda environment rather than let spacyr install a fresh virtual environment is that I am working through a third party cloud service that blocks packages from downloads, so I have to create the conda environment myself in the terminal and install it.

JBGruber commented 2 months ago

Turns out there is an easy workaround if you want to use (an old) conda install:

library(spacyr)
spacy_initialize()
#> Error in spacy_initialize(): No spaCy environment found. Use `spacy_install()` to get started.

I just run this to show I have no Python virtualenv installed in this scenario. By setting the RETICULATE_PYTHON to the path of the binary (!), I can still employ an install that uses conda (which I made with spacyr 1.2.1):

Sys.setenv(RETICULATE_PYTHON = "/home/johannes/.conda/envs/spacy_condaenv/bin/python") # needs to point to python binary
spacy_initialize()
#> successfully initialized (spaCy Version: 3.7.3, language model: en_core_web_sm)

And just to show that it works as expected.

txt <- "And now for something completely different."
spacy_parse(txt)
#>   doc_id sentence_id token_id      token      lemma   pos entity
#> 1  text1           1        1        And        and CCONJ       
#> 2  text1           1        2        now        now   ADV       
#> 3  text1           1        3        for        for   ADP       
#> 4  text1           1        4  something  something  PRON       
#> 5  text1           1        5 completely completely   ADV       
#> 6  text1           1        6  different  different   ADJ       
#> 7  text1           1        7          .          . PUNCT

Created on 2024-05-17 with reprex v2.1.0

The manual installation would look something like this:

Sys.setenv(CRYPTOGRAPHY_OPENSSL_NO_LEGACY = TRUE) # I had to set this on my machine
# 1. set up conda environment
reticulate::conda_create("spacy_condaenv")
# 2. install spacy
reticulate::conda_install("spacy_condaenv", "spacy")
# 3. set RETICULATE_PYTHON variable (at the beginning of every R session)
Sys.setenv(RETICULATE_PYTHON = "/home/johannes/.conda/envs/spacy_condaenv/bin/python")
# 4. download models
spacyr::spacy_download_langmodel()
kbenoit commented 2 months ago

Might be worth adding to the documentation? Perhaps a Wiki?