openeventdata / mordecai

Full text geoparsing as a Python library
MIT License
742 stars 97 forks source link

compatibility with spacy 2.1 #69

Closed phaterpekar closed 4 years ago

phaterpekar commented 5 years ago

Would it be possible to make this compatible with spacy v2.1 ? Trying to use this for geoparsing but I am using spacy v2.1 and the newer "en_core_web_lg" model for other downstream tasks. Since Mordecai is not compatible with spacy 2.1 yet, it tries to downgrade to spacy 2.0 and requires the older models at the moment.

nyejon commented 5 years ago

I agree, is this project still going to be maintained?

ahalterman commented 5 years ago

I think moving to 2.1 makes sense. I was keeping it with 2.0 originally because I was using some custom models trained with Prodigy that weren't compatible with 2.1 yet. But I should be able to retrain those models with the new Prodigy so they're compatible with spaCy 2.1. I can't get to it this week but should be able to next week.

phaterpekar commented 5 years ago

That would be great ! Does Mordecai give a city/state/country "focus" breakdown like CLIFF-CLAVIN does ? I could only find the country focus using infer_country but not a way to get city/state "focus". When analyzing news/geo-political events there can be many toponyms in an article, but usually the "focus" using naive heuristics like frequency, position in the article seems to work better than most out to pin down the top geolocations for the article ? Any thoughts on that ?

ahalterman commented 5 years ago

It doesn't right now, for anything beyond the level of the article. I think a "focus" location sometimes makes sense for news articles and the heuristics you list are the ones I'd use too. But in many other cases, people use "focus" locations in a way that's not well defined. I have a paper describing a technique for linking specific events in text with their locations (link), which is what the Prodigy models I mentioned before were being used for. If I get some free time in the next couple months I'd like to incorporate that model into Mordecai.