gyorilab / mira

MIRA modeling framework
BSD 2-Clause "Simplified" License
9 stars 7 forks source link

[BUG] The EPI DKG seems to be missing a few `geonames` nodes #370

Open liunelson opened 1 day ago

liunelson commented 1 day ago

I used this endpoint to try to find the curie for locations in the United States. http://mira-epi-dkg-lb-c7b58edea41524e6.elb.us-east-1.amazonaws.com/docs#/grounding/ground_get_api_ground__text__get

However, no result was returned for these terms:Vermont, Maine, West Virginia, Wyoming - they can be found here though https://www.geonames.org/search.html?q=wyoming&country=US

Similarly, the term Delaware only returns a result with the prefix ncit, not with geonames, even though it exists.

nanglo123 commented 15 hours ago

So this seems to occur because we filter out all cities that have a population under 100,000. And for those states you listed, none of the cities in those states in the dataset we use have a population over 100,000. And because of the way we add geoterms currently, we don't add those states as geonames nodes in the EPI DKG. This logic isn't the best and we should add all countries and states and filter only on the city level. I'll work on this.