Closed ogencoglu closed 10 months ago
Are you interested in running a spacy model in parallel to a Huggingface model?
If yes, then the best way to do this is to use one of them (say spaCy) as an NlpEngine, and the other as an additional recognizer. There's no reason to have them both as NLP Engines as the other functions the nlp engine brings (tokens, lemmas, keywords) are not needed twice.
Do I understand you correctly that I can do something like? :
configuration = {
"nlp_engine_name": "spacy",
"models": [{"lang_code": "fi", "model_name": "fi_core_news_sm"},
{"lang_code": "en", "model_name": "en_core_web_sm"},
{"lang_code": "ru", "model_name": "ru_core_news_sm"},
{"lang_code": "et", "model_name": {"spacy": "en_core_web_sm", "transformers": "tartuNLP/EstBERT_NER_v2"}
],
}
You can either use a SpacyNlpEngine
or a TransformersNlpEngine
, but not both, so the provided configuration would not work. For the configuration you have here, the simplest way would be to:
fi
, en
, ru
languages using SpacyNlpEngine
with the configuration you provided, without the line for et
.This would allow you to have all models running in parallel.
Note that small spaCy models (everything that ends with _sm
) are not very accurate at identifying named entities.
Thanks for the swift reply!
Continuing the discussion, do I understand correctly that even if I have all models running in parallel (as you described above), I still need to tell the specific language AnalyzerEngine.analyze
works on such as
analyze(
text=text,
entities=analyzer.get_supported_entities(),
language="en",
return_decision_process=False,
)
and I can not do something like language=["en", "ru", "et"]
?
Meaning that, I still need to detect the language of the text or conversation with some language detection tool to be able to route the pipeline to the correct language.
Is my understanding correct?
Is it possible to combine spacy and transformers configs into a single one?
For example
+
in a single config?