mediacloud / cliff-annotator

A lightweight server to allow HTTP requests to the Stanford Named Entity Recognized and a heavily modified CLAVIN geoparser.
https://cliff.mediacloud.org
Apache License 2.0
119 stars 35 forks source link

Unable to geoparse German text #76

Open jseebacher opened 4 years ago

jseebacher commented 4 years ago

According to https://cliff.mediacloud.org/, cliff can also geoparse German text. Assuming that I can tell the API via ?language=DE from the form params in the calls on https://cliff.mediacloud.org/.

However it doesn't seem to work - cliff does not recognize even one-word-requests anymore (like q=Krems, a small town in Austria). The same requests would be resolved with language=EN.

I am running on a local machine with 16GB RAM and from the cliff-docker.

rahulbot commented 4 years ago

I've done only initial testing in German, which I don't speak, so I can't characterize how well it works. That said, I just tried a short sentence from a Der Spiegel article on our homepage and found that it pulled out a place and person pretty well - results below. You are correct that the one-word request for Krems doesn't work. I expect that is because the models are built to process sentences, but I'm not sure 🤷🏽‍♂️.

Input (from this article) "Alesja sieht noch, wie Beamte einer Omon-Sondereinheit mit Schlagstöcken auf ihren Mann einprügeln. Iwan liegt mit anderen auf der Straße an einer Brücke in Minsk, hält die Hände über den Kopf verschränkt. "

Results

{
  "milliseconds": 5,
  "language": "DE",
  "version": "2.6.1",
  "results": {
    "places": {
      "mentions": [
        {
          "featureCode": "PPLC",
          "featureClass": "P",
          "confidence": 1,
          "lon": 27.56667,
          "countryGeoNameId": "630336",
          "source": {
            "charIndex": 159,
            "string": "Minsk"
          },
          "population": 1742124,
          "stateGeoNameId": "625143",
          "countryCode": "BY",
          "name": "Minsk",
          "stateCode": "04",
          "id": 625144,
          "lat": 53.9
        }
      ],
      "focus": {
        "cities": [
          {
            "score": 1,
            "featureCode": "PPLC",
            "stateGeoNameId": "625143",
            "featureClass": "P",
            "countryCode": "BY",
            "name": "Minsk",
            "lon": 27.56667,
            "countryGeoNameId": "630336",
            "stateCode": "04",
            "id": 625144,
            "lat": 53.9,
            "population": 1742124
          }
        ],
        "countries": [
          {
            "score": 1,
            "featureCode": "PCLI",
            "stateGeoNameId": "",
            "featureClass": "A",
            "countryCode": "BY",
            "name": "Republic of Belarus",
            "lon": 28,
            "countryGeoNameId": "630336",
            "stateCode": "00",
            "id": 630336,
            "lat": 53,
            "population": 9685000
          }
        ],
        "states": [
          {
            "score": 1,
            "featureCode": "ADM1",
            "stateGeoNameId": "625143",
            "featureClass": "A",
            "countryCode": "BY",
            "name": "Horad Minsk",
            "lon": 27.56667,
            "countryGeoNameId": "630336",
            "stateCode": "04",
            "id": 625143,
            "lat": 53.9,
            "population": 2002600
          }
        ]
      }
    },
    "organizations": [],
    "people": [
      {
        "name": "Iwan",
        "count": 1
      }
    ]
  },
  "status": "ok"
}