rstudio / reticulate

R Interface to Python
https://rstudio.github.io/reticulate
Apache License 2.0
1.68k stars 327 forks source link

weird issue with dynamic imports? #396

Closed randomgambit closed 8 months ago

randomgambit commented 5 years ago

Hello the reticulate team. I am escalating this issue with you because I was unable to solve it otherwise.

Please see here https://github.com/explosion/spaCy/issues/2982

Here is the issue: I am perfectly able to use the well-known Spacy package in python and, in particular, to load my custom model in spyder

Python 3.6.6 |Anaconda custom (64-bit)| (default, Jun 28 2018, 17:14:51)
Type "copyright", "credits" or "license" for more information.

IPython 6.5.0 -- An enhanced Interactive Python.

import spacy

nlp = spacy.load('/otherdrive/model/en_core_web_lg-2.0.0/en_core_web_lg/en_core_web_lg-2.0.0')

Now doing the same exact thing in R (using reticulate) triggers an error when spacy apparently cannot load the en model (even though I pointed toward a specific folder)


library(reticulate)
use_python("/mydrive/anaconda/bin/python")

py_run_string("import spacy")
py_run_string("nlp = spacy.load('/otherdrive/model/en_core_web_lg-2.0.0/en_core_web_lg/en_core_web_lg-2.0.0')")
 ImportError: [E048] Can't import language en from spacy.lang.

After discussin a bit with @ines ,

Thanks for the report! There's not really an "en model" – that's just the shortcut for the en_core_web_sm package. What the error means here is that it can't load the spacy.lang.en module containing the language data to initialize the language class.

Because some languages now ship with a lot of language data, spaCy lazy-loads the modules only when needed – see here for how this is implemented (including when the erorr is raised):

spaCy/spacy/util.py

Lines 41 to 54 in 658f7e0

def get_lang_class(lang): """Import and load a Language class.

 lang (unicode): Two-letter language code, e.g. 'en'. 
 RETURNS (Language): Language class. 
 """ 
 global LANGUAGES 
 if lang not in LANGUAGES: 
     try: 
         module = importlib.import_module('.lang.%s' % lang, 'spacy') 
     except ImportError: 
         raise ImportError(Errors.E048.format(lang=lang)) 
     LANGUAGES[lang] = getattr(module, module.__all__[0]) 
 return LANGUAGES[lang] 

Is it possible that reticulate somehow handles the dynamic imports differently? One thing you could try is edit that file in your spaCy installation, remove the try/except block and have a look at the original error and traceback. Maybe this gives us better insight into why it fails to import the module?

What do you think? THanks!!

kevinushey commented 5 years ago

Can you provide a reproducible example? (We do not have access to your model object.)

randomgambit commented 5 years ago

Hi @kevinushey ! its been a while! you can reproduce the error by downloading and unpacking the model here https://github.com/explosion/spacy-models/releases//tag/en_core_web_lg-2.0.0 . then you just give the path as I did here

nlp = spacy.load('/otherdrive/model/en_core_web_lg-2.0.0/en_core_web_lg/en_core_web_lg-2.0.0')

kevinushey commented 5 years ago

Thanks! This also reproduces the issue, it seems:

library(reticulate)
spacy <- import("spacy")
spacy$load('en')

EDIT: scratch that; that error went away after downloading the language file with the instructions at https://spacy.io/usage/models; that is, I ran:

python3 -m spacy download en

and then was able to load that.

randomgambit commented 5 years ago

@kevinushey be careful that a model must be either downloaded with conda or loaded with spacy.load() as I did. https://spacy.io/usage/models#download-manual

randomgambit commented 5 years ago

yes that is the issue here. I cannot use the spacy download because I am behind a firewall. Also, I plan to use this with sparklyr so it makes totally sense to load my model from a given network drive. This works in spyder but doesnt in R with reticulate. There has to be some rational explanation here...

kevinushey commented 5 years ago

I was able to successfully load that model as well, so my only guess is that this:

python3 -m spacy download en

or some variant of that needs to be run in your environment. FWIW I saw this output on install:

kevin@zordon:~/r/pkg/reticulate [feature/python-virtualenv-absolute-path]
$ python3 -m spacy download en
Collecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 7.7MB/s
Installing collected packages: en-core-web-sm
  Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0

    Linking successful
    /usr/local/lib/python3.7/site-packages/en_core_web_sm -->
    /Users/kevin/Library/Python/3.7/lib/python/site-packages/spacy/data/en

    You can now load the model via spacy.load('en')

Is it possible that scapy isn't seen the symlink it needs to load this module?

randomgambit commented 5 years ago

I see, but I cannot use the spacy download and nobody behind a firewall can. So the only option is to load the model manually with spacy_load() I am not quite sure to understand how linking is done, but were you actually able to run something like

py_run_string("import spacy")
py_run_string("nlp = spacy.load('/otherdrive/model/en_core_web_lg-2.0.0/en_core_web_lg/en_core_web_lg-2.0.0')")

perhaps @ines has an idea? Thanks!!!

kevinushey commented 5 years ago

Yes, that code runs fine for me in my environment.

randomgambit commented 5 years ago

damn... what do you see when you run reticulate::py_config() ? I see multiple versions of python under python versions found:

could that be the issue?

randomgambit commented 5 years ago

ha... actually running > py_run_string("import pandas as pd") gives me some chilling message ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found so maybe this is part of a broader issue of how to make R interact well with Python?

randomgambit commented 5 years ago

and why on earth is this looking at /usr/lib ?? My python exe is in /mydrive/anaconda/bin/python maybe THAT is the issue?

ines commented 5 years ago

@randomgambit Sorry, only saw this discussion now! From your posts in explosion/spaCy#2982, it definitely sounds like this is unrelated to the models, so I'd recommend leaving them out of the test cases here, since it just introduces unnecessary complexity. Here's a simpler example, which only imports from regular spaCy module:

from spacy.lang.en import English

In your environment, this resulted in an ImportError of DependencyParser. DependencyParser is a Cython module and errors like this usually indicate that there's something wrong with your compiler. The other error you shared also confirms that suspicion:

ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found

If you search for that error message, you'll find all kinds of threads on this problem with various solutions – but it looks like they all come down to upgrading libstd. I'm pretty confident that once you've resolve that problem, spaCy will also work as expected 🙂