snipsco / snips-nlu

Snips Python library to extract meaning from text
https://snips-nlu.readthedocs.io
Apache License 2.0
3.89k stars 513 forks source link

difference in the confidence values after stop word removal #831

Open cahuja1992 opened 5 years ago

cahuja1992 commented 5 years ago

Even though the stop word removal is enabled, the confidence of the utterance with and without stop words are different.

Example: "turn on the wifi" vs "turn on wifi", here the is the stop word. After looking into the code, I realized that it is actually the confidence value calculation that might be taking the number of tokens into account also.

adrienball commented 5 years ago

@cahuja1992

Even though the stop word removal is enabled

Are you referring to the ignore_stop_words parameter of the LookupIntentParserConfig ?

This parser tries to find an exact match with one of the training samples, and it can do so by ignoring stop words. If it does not find a match, then the probabilistic intent parser is used and this one will not ignore stop words.

In your case, is one of the two formulations ("turn on the wifi" or "turn on wifi") in your training data ?

cahuja1992 commented 5 years ago

@adrienball Okay got it. Then what I need is while predicting also, can we remove the stop words from the input utterance?