snipsco / snips-nlu

Snips Python library to extract meaning from text
https://snips-nlu.readthedocs.io
Apache License 2.0
3.89k stars 513 forks source link

Umlauts within phrase are causing odd intent matches #886

Open Corasonn opened 4 years ago

Corasonn commented 4 years ago

Some of my entity values contain umlauts. When I want to recognize them with a specific intent, snips matches it so any other intent that also contains this entity. But the right intent would fit 100%. With any other value without an umlaut, snips will match the right intent with 1.0 score.

Expected: Intents with entities with umlauts are matched correctly.

Environment:

Corasonn commented 4 years ago

I found the problem. When I have more than 10000 entity values, snips doesn't build some entity variations due to a better building performance. PR was: https://github.com/snipsco/snips-nlu/pull/804

Unfortunately, it seems to break umlauts when the "case" variation is missing. I forked the project and changed it hardcoded (https://github.com/Corasonn/snips-nlu). I'm not a python developer, so if someone knows how to set it via flag, it would be great!