Open jseebacher opened 4 years ago
I've done only initial testing in German, which I don't speak, so I can't characterize how well it works. That said, I just tried a short sentence from a Der Spiegel article on our homepage and found that it pulled out a place and person pretty well - results below. You are correct that the one-word request for Krems doesn't work. I expect that is because the models are built to process sentences, but I'm not sure 🤷🏽♂️.
Input (from this article) "Alesja sieht noch, wie Beamte einer Omon-Sondereinheit mit Schlagstöcken auf ihren Mann einprügeln. Iwan liegt mit anderen auf der Straße an einer Brücke in Minsk, hält die Hände über den Kopf verschränkt. "
Results
{
"milliseconds": 5,
"language": "DE",
"version": "2.6.1",
"results": {
"places": {
"mentions": [
{
"featureCode": "PPLC",
"featureClass": "P",
"confidence": 1,
"lon": 27.56667,
"countryGeoNameId": "630336",
"source": {
"charIndex": 159,
"string": "Minsk"
},
"population": 1742124,
"stateGeoNameId": "625143",
"countryCode": "BY",
"name": "Minsk",
"stateCode": "04",
"id": 625144,
"lat": 53.9
}
],
"focus": {
"cities": [
{
"score": 1,
"featureCode": "PPLC",
"stateGeoNameId": "625143",
"featureClass": "P",
"countryCode": "BY",
"name": "Minsk",
"lon": 27.56667,
"countryGeoNameId": "630336",
"stateCode": "04",
"id": 625144,
"lat": 53.9,
"population": 1742124
}
],
"countries": [
{
"score": 1,
"featureCode": "PCLI",
"stateGeoNameId": "",
"featureClass": "A",
"countryCode": "BY",
"name": "Republic of Belarus",
"lon": 28,
"countryGeoNameId": "630336",
"stateCode": "00",
"id": 630336,
"lat": 53,
"population": 9685000
}
],
"states": [
{
"score": 1,
"featureCode": "ADM1",
"stateGeoNameId": "625143",
"featureClass": "A",
"countryCode": "BY",
"name": "Horad Minsk",
"lon": 27.56667,
"countryGeoNameId": "630336",
"stateCode": "04",
"id": 625143,
"lat": 53.9,
"population": 2002600
}
]
}
},
"organizations": [],
"people": [
{
"name": "Iwan",
"count": 1
}
]
},
"status": "ok"
}
According to https://cliff.mediacloud.org/, cliff can also geoparse German text. Assuming that I can tell the API via ?language=DE from the form params in the calls on https://cliff.mediacloud.org/.
However it doesn't seem to work - cliff does not recognize even one-word-requests anymore (like
q=Krems
, a small town in Austria). The same requests would be resolved with language=EN.I am running on a local machine with 16GB RAM and from the cliff-docker.