mediacloud / cliff-annotator

A lightweight server to allow HTTP requests to the Stanford Named Entity Recognized and a heavily modified CLAVIN geoparser.
Apache License 2.0
119 stars 34 forks source link

News Articles about Washington, always resolve to Washington state instead of Washington, DC. #65

Open phaterpekar opened 5 years ago

phaterpekar commented 5 years ago

Articles about Washington DC, almost always resolve to Washington State.

Example text :

LAST week the US Secretary of State, Mike Pompeo, said Washington is not seeking a permanent military presence in Afghanistan, after the Taliban said it was close to finalising a peace agreement with the United States.

Yesterday Washington’s top negotiator Zalmay Khalilzad confirmed that the US will withdraw 5,400 troops from Afghanistan within 20 weeks as part of a deal reached, in principle, with the Taliban.

The deal awaits final approval from US President Donald Trump who is said to be determined to get out of Afghanistan. After his approval, the deal will be shown to the current Afghan puppet government, whose fate has been sealed by it.

rahulbot commented 4 years ago

This is a tough one to solve, because one of our heuristics prefers larger populated geographic areas. This set of heuristics is encoded in our DisambiguationStrategy, but it takes some digging with a debugger to see which one is firing on that sample text. You can do this if you set logging to DEBUG level. There are always going to be failures, though this one is a particularly glaring one.