Do I have to load EN model?

Hey @milsanore -

To answer your question, this module absolutely should work with any of the spaCy models (including the much-compressed en_core_web_sm English model), though I haven't explicitly tested those (a PR that added more unit tests would be awesome!).

But it sounds more like you're trying to avoid parsing your documents more than once - I should explicitly say that you don't need to load the models every time you parse a document. For example, the following code is totally valid, and only loads the en module into memory once:

import spacy
nlp = spacy.load('en')
doc1 = nlp('This is some text.')
doc2 = nlp('This is some more text.')

Unfortunately, it's not possible with this module to do the language detection outside of a specific model's pipeline. So you're stuck with either parsing every document twice (once to detect the language and another time with the correct model to do whatever else you need) or detecting the language from a sample of each doc. Getting fancier with language detection would require to spaCy core (which was actually talked about in the issue that inspired this component).

nickdavidhaynes / spacy-cld

Do I have to load EN model? #3