projecte-aina / spacy

Pre-production releases for Spacy in Catalan
MIT License
14 stars 2 forks source link

Model in tar.gz #2

Open jacarrasco opened 3 years ago

jacarrasco commented 3 years ago

Hello,

Thank you for this amazing work you are doing. Do you have any plan to release new models in tar.gz? I tried to transform the whl models to tar.gz but i am having some errors. Tar.gz are sometimes needed to execute spacy in a serverless Environment.

cayorodriguez commented 3 years ago

We have just released the tar.gz versions. Treat with caution since the embedded code might not work if installed with pip install xxx.tar.gz https://github.com/TeMU-BSC/spacy/releases/tag/3.2.4gz

jacarrasco commented 3 years ago

Moltes Gracies @cayorodriguez ! I tried to run a quick test using it as I usually use other models:

    nlp = spacy.load(model_full_path)
    text = "La Generalitat ha presentat aquest dilluns el seu pla per començar a mobilitzar el turisme internacional i atraure'l cap al Principat."
    my_doc = nlp(text)

But I am getting below exception when loading the model and I have spacy-transformers installed.

"errorMessage": "[E002] Can't find factory for 'ca.nonberta.lemmatizer' for language Catalan (ca). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator @Language.component (for function components) or @Language.factory (for class components).\n\nAvailable factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel", "errorType": "ValueError",

Am I doing something wrong? Does it needs to be uploaded in a specific way? Do you have any example code?

Thank you,

cayorodriguez commented 3 years ago

Hi, Since these models have not been integrated yet into the "official" spacy modules, we have been embedding our bespoke code so that it uses it when installed from a wheel into a conda environment. Some of these modules have to be named differently from the customary form so that they don't interfere with each other on the same namespace. Until spacy integrates our model into their framework we have not found a practical way to make this work other than using separate python environments. We don't recommend loading from a full path, but as a module... If you can, use the ca_base_web_trf model, and see if it works for you. I am adding in cc my colleague Asier to see if he has a better idea.

Carlos Rodriguez

On Wed, Jun 2, 2021 at 11:15 PM Jose A. Carrasco @.***> wrote:

Moltes Gracies @cayorodriguez https://github.com/cayorodriguez ! I tried to run a quick test using it as I usually use other models:

nlp = spacy.load(model_full_path)

text = "La Generalitat ha presentat aquest dilluns el seu pla per començar a mobilitzar el turisme internacional i atraure'l cap al Principat."

my_doc = nlp(text)

But I am getting below exception when loading the model and I have spacy-transformers installed.

"errorMessage": "[E002] Can't find factory for 'ca.nonberta.lemmatizer' for language Catalan (ca). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator @Language.component (for function components) or @Language.factory (for class components).\n\nAvailable factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel", "errorType": "ValueError",

Am I doing something wrong? Does it needs to be uploaded in a specific way? Do you have any example code?

Thank you,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TeMU-BSC/spacy/issues/2#issuecomment-853387258, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FKPRMH5LLMIRFZU4T3MLTQ2NOVANCNFSM455S3N6Q .

jacarrasco commented 3 years ago

Hi,

Thank you for the effort and help. To execute spacy in a serverless environment like AWS Lambday usually the tar.gz is needed, but if there is no other way, i will continue using the catalan model for spacy 2. Please let me know if you find a solution.

Moltes Gracies!

cayorodriguez commented 3 years ago

Good news! The Spacy team has just let us know that they are evaluating the catalan model for the next release of version 3.1 (very soon, maybe even next week) . So we will either have an "official" version from their site and/or a BSC version that doesn't need bespoke code, since our language and lookup data will be incorporated into their repositories, and we will be able to offer our own that works in headless environments. Also, both models, the official and BSC ones, might have performances that are a bit different, since y¡the official models won't have parameters tuned up as much as our own. We'll let you know soon.

jacarrasco commented 3 years ago

Hi!

I have seen that in the recent commits they already included some information about the catalan models. https://github.com/explosion/spacy-models But I am still confused about when and how to use it. Will Spacy release its own catalan model to be used with spacy v3.1? Can it be used in current status? Or should we wait until it is fully release?

Thank you and congrats for this good work.

cayorodriguez commented 3 years ago

The release of spacy 3.1 seems to be imminent and I guess they are waiting for that to annouce the ca models. We haven’t tested them, though.

On Tue, 22 Jun 2021 at 22:07, Jose A. Carrasco @.***> wrote:

Hi!

I have seen that in the recent commits they already included some information about the catalan models. https://github.com/explosion/spacy-models But I am still confused about when and how to use it. Will Spacy release its own catalan model to be used with spacy v3.1? Can it be used in current status? Or should we wait until it is fully release?

Thank you and congrats for this good work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TeMU-BSC/spacy/issues/2#issuecomment-866297401, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FKPULHCF3AXWVCT7MBW3TUDURNANCNFSM455S3N6Q .

--

Carlos Rodriguez