openeventdata / mordecai

Full text geoparsing as a Python library
MIT License
742 stars 97 forks source link

What is that enumerated loop in infer_country supposed to do? #84

Open andreas-wolf opened 4 years ago

andreas-wolf commented 4 years ago

Sorry, I really do not understand this code: https://github.com/openeventdata/mordecai/blob/05fa31d6af0b4d57616bd1ad1250f9daaad8157b/mordecai/geoparse.py#L716-L738

You're looping over a collection proced For every entry you create a country matrix. Ok. Which you append to a list. Why? Then you loop over that list, FOR EVERY loc in proced, but you only use the last iteration of that loop to write it back to loc Furthermore when spacy found a location which is not in geonames then i['matrix'] is an empty list and self.country_model.predict(i['matrix']) throws a ValueError. It might be wiser to test feat if it contains values before calling predict.