Open oterrier opened 9 months ago
Some more recent tests in French
En fr, Wikidata sort sur les noms des pays : Allemagne : disambiguated as Empire allemand, Equipe d'Allemagne de football Grèce : disambiguated as Grèce antique Roumanie : disambiguated as Royaume de roumanie
whatever you put in maxTermFrequency
request
{
"text": "Fabrication d'un violoncelle dans un atelier de lutherie à Reghin, en Roumanie, le 22 janvier 2021.",
"shortText": "",
"termVector": [],
"language": {
"lang": "fr"
},
"entities": [],
"mentions": [
"wikipedia"
],
"nbest": false,
"sentence": false,
"minSelectorScore": 0.2,
"maxTermFrequency": 5
}
response
{
"software": "entity-fishing",
"version": "0.0.6",
"date": "2024-05-23T14:31:45.359208132Z",
"runtime": 31,
"nbest": false,
"text": "Fabrication d'un violoncelle dans un atelier de lutherie à Reghin, en Roumanie, le 22 janvier 2021.",
"language": {
"lang": "fr",
"conf": 1
},
"global_categories": [
{
"weight": 0.14285714285714288,
"source": "wikipedia-fr",
"category": "Instrument de musique classique",
"page_id": 199859
},
{
"weight": 0.14285714285714288,
"source": "wikipedia-fr",
"category": "Violoncelle",
"page_id": 986894
},
{
"weight": 0.14285714285714288,
"source": "wikipedia-fr",
"category": "Municipalité dans le județ de Mureș",
"page_id": 11926951
},
{
"weight": 0.14285714285714288,
"source": "wikipedia-fr",
"category": "Instrument à cordes frottées",
"page_id": 317874
},
{
"weight": 0.14285714285714288,
"source": "wikipedia-fr",
"category": "Royaume de Roumanie",
"page_id": 8183397
},
{
"weight": 0.14285714285714288,
"source": "wikipedia-fr",
"category": "Page contenant une partition",
"page_id": 13964105
},
{
"weight": 0.14285714285714288,
"source": "wikipedia-fr",
"category": "Lutherie",
"page_id": 1310062
}
],
"entities": [
{
"rawName": "violoncelle",
"offsetStart": 17,
"offsetEnd": 28,
"confidence_score": 0.551,
"wikipediaExternalRef": 10822,
"wikidataId": "Q8371",
"domains": [
"Acoustics",
"Artisanship"
]
},
{
"rawName": "atelier de lutherie",
"offsetStart": 37,
"offsetEnd": 56,
"confidence_score": 0.4053,
"wikipediaExternalRef": 167295,
"wikidataId": "Q3267878"
},
{
"rawName": "Reghin",
"offsetStart": 59,
"offsetEnd": 65,
"confidence_score": 0.8624,
"wikipediaExternalRef": 3813284,
"wikidataId": "Q572478",
"domains": [
"Geography",
"Architecture"
]
},
{
"rawName": "Roumanie",
"offsetStart": 70,
"offsetEnd": 78,
"confidence_score": 0.6214,
"wikipediaExternalRef": 1387867,
"wikidataId": "Q203493",
"domains": [
"Military"
]
},
{
"rawName": "22 janvier",
"offsetStart": 83,
"offsetEnd": 93,
"confidence_score": 0.8398,
"wikipediaExternalRef": 3688,
"wikidataId": "Q2275",
"domains": [
"Geology",
"Oceanography",
"Earth"
]
}
]
}
Sorry for the late reply, this is weird indeed, I'll try to see what is happening in the disambiguation process for these countries.
Cannot find any example to have "Maroc" disambiguated as the country (Q1028)
For example with this query
It is disambiguated as French protectorate in Morocco (Q907234) Some other times as Morocco national football team (Q207337)
But never as Morocco (Q1028) nevertheless it is the concept with the higher conditional probability (0.903404988057546)
I can't explain why: any clue ?
Thx Olivier