kermitt2 / entity-fishing

A machine learning tool for fishing entities
http://nerd.readthedocs.io/
Apache License 2.0
249 stars 24 forks source link

Case and term selection for French #139

Closed kermitt2 closed 2 years ago

kermitt2 commented 2 years ago

There's currently a problem with the selection of candidates for French. It appears that the case condition is not applied as it should.

For example, in the following query, common words in lower case raises some candidates corresponding to full upper case terms: sait - > SAIT -> Southern Alberta Institute of Technology ou -> OU -> French NER gives institution aise -> AISE -> Agenzia Informazioni e Sicurezza Esterna

It is not the case for English and it seems related to the French NER.

{
    "text": "On ne sait pas si l'autre est à l'aise ou pas, jusqu'où on peut aller. ",
    "shortText": "",
    "termVector": [],
    "language": {
        "lang": "en"
    },
    "entities": [],
    "mentions": [
        "ner",
        "wikipedia"
    ],
    "nbest": false,
    "sentence": false
}