mediacloud / cliff-annotator

A lightweight server to allow HTTP requests to the Stanford Named Entity Recognized and a heavily modified CLAVIN geoparser.
https://cliff.mediacloud.org
Apache License 2.0
119 stars 35 forks source link

Large Areas resolving to wrong countries #25

Closed kanarinka closed 9 years ago

kanarinka commented 10 years ago

Western Europe resolves to Germany (the country) and Eastern Europe resolves to Belarus.

-- see below query --

http://civicdev.media.mit.edu:8080/CLIFF/parse/text?q=Next%20Map%20%3E%20This%20map%20shows%20books%20borrowed%20from%20public%20libraries%20-%20which%20lend%20books%20to%20members%20for%20free%20or%20for%20a%20nominal%20charge.%20Libraries%20share%20books,%20making%20it%20unnecessary%20for%20us%20to%20buy%20books%20that%20we%20will%20read%20only%20once%20or%20twice.%20The%20most%20books%20borrowed%20were%20in%20the%20Russian%20Federation.%20There%20were%20high%20rates%20of%20borrowing%20in%20Western%20Europe,%20Japan%20and%20Eastern%20Europe.%20In%20these%20regions%20most%20territories%20reported%20some%20book%20borrowing.%20In%20other%20regions%20reported%20book%20borrowing%20was%20lower,%20and%20many%20territories%20reported%20very%20little%20borrowing.%20Where%20many%20people%20cannot%20afford%20books,%20it%20appears%20they%20often%20cannot%20borrow%20them%20either.%20%22In%20vain%20have%20you%20acquired%20knowledge,%20if%20you%20have%20not%20imparted%20it%20to%20others.%22%20Deuteronomy%20Rabbah,%20undated%20Territory%20size%20shows%20the%20proportion%20of%20all%20library%20books%20borrowed%20that%20were%20borrowed%20there.

rahulbot commented 9 years ago

This fix for this is probably related to #12

rahulbot commented 9 years ago

This is doing kind of the right thing now (ie. v1.4.1). For instance, Western Europe resolves to the right geonames id (7729881), but the countryGeoNameId is Germany and the state is in Germany. So our logic for focus uses those German places, because it can't handle areas (only cities, states, and countries).

Perhaps a better behaviour is to have the focus logic ignore large areas?

rahulbot commented 9 years ago

This works in v2.1.0 - it returns mentioned for the regions of Western and Eastern Europe, but the only country of focus is Japan.