pdrm83 / sent2vec

How to encode sentences in a high-dimensional vector space, a.k.a., sentence embedding.
MIT License
132 stars 12 forks source link

It seems that some internal code is deprecated #8

Closed tonicebrian closed 2 years ago

tonicebrian commented 3 years ago

After installing em_core_web_sm==3.0.0, when I try to use the splitter I get this exception that calls for a change in the code. I guess the solution is either update the code or fix the em_core_web_sm dependency to a valid version.

╰─ ipython3
Python 3.8.5 (default, Jan 27 2021, 15:41:15) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from sent2vec.splitter import Splitter
   ...: 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-7bef766639af> in <module>
----> 1 from sent2vec.splitter import Splitter

~/.local/lib/python3.8/site-packages/sent2vec/splitter.py in <module>
      3 nlp = spacy.load("en_core_web_sm")
      4 sentencizer = nlp.create_pipe("sentencizer")
----> 5 nlp.add_pipe(sentencizer)
      6 
      7 

~/.local/lib/python3.8/site-packages/spacy/language.py in add_pipe(self, factory_name, name, before, after, first, last, source, config, raw_config, validate)
    746             bad_val = repr(factory_name)
    747             err = Errors.E966.format(component=bad_val, name=name)
--> 748             raise ValueError(err)
    749         name = name if name is not None else factory_name
    750         if name in self.component_names:

ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy.pipeline.sentencizer.Sentencizer object at 0x7f1d437025c0> (name: 'None').

- If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead.

- If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`.

- If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline.
pdrm83 commented 2 years ago

Thanks @almarengo for resolving it.