openeventdata / mordecai

Full text geoparsing as a Python library
MIT License
742 stars 97 forks source link

try to load nlp on importing package #88

Open flashpixx opened 3 years ago

flashpixx commented 3 years ago

I have got a problem, because this code in the utilities.py is executed everytime if utilities packages is imported:

try:
    nlp
except NameError:
    nlp = spacy.load('en_core_web_lg')

In my case I don't have installed the en_core_web_lg because I would like to use my own NLP. In the geiparse.py the class uses the utilities (on import) and it breaks. I will set the nlp argument:

class Geoparser:
    def __init__(self, nlp=None, es_hosts=None, es_port=None, es_ssl=False, es_auth=None,
                 verbose=False, country_threshold=0.6, threads=True,
                 progress=True, training=None, models_path=None, **kwargs):
        DATA_PATH = pkg_resources.resource_filename('mordecai', 'data/')
        if not models_path:
            models_path = pkg_resources.resource_filename('mordecai', 'models/')
            print("Models path:", models_path)
        if nlp:
            self.nlp = nlp
        else:
            try:
                self.nlp = spacy.load('en_core_web_lg', disable=['parser', 'tagger']) 
            except OSError:
                print("""ERROR: No spaCy NLP model installed. Install with this command: 
                `python -m spacy download en_core_web_lg`.""") 

IMHO the import of the en_core_web_lg in the utilities.py should be removed because it is redundant.

Thanks

ahalterman commented 3 years ago

Thanks for that catch! I'm hoping to do a refresh of the whole codebase in February and I'll definitely remove that redundant import when I do.