snipsco / snips-nlu

Snips Python library to extract meaning from text
https://snips-nlu.readthedocs.io
Apache License 2.0
3.9k stars 513 forks source link

Out of scope strategy documentation #599

Closed elyase closed 6 years ago

elyase commented 6 years ago

Is the out of scope strategy documented somewhere? I seem to have read something about randomly generated utterances for the None class but it would be great to have some details.

adrienball commented 6 years ago

Hi @elyase, you are right, it is not well documented at the moment. The IntentClassifierDataAugmentationConfig is responsible for configuring how data is augmented before training the intent classifier. More specifically, it contains this attribute:

        noise_factor (int, optional): Defines the size of the noise to
            generate to train the implicit *None* intent, as a multiplier of
            the average size of the other intents. Default is 5.

We then generate None utterances, by randomly sampling words from text files such as this one. The words in these files have been samples based on their global frequency in the corresponding language. Hence, it approximates the words distribution for the language.

This solution is probably not the best one overall, but it has been good enough so far. Generating utterances for the None intent is inherently tricky as it consists in modelling a very large and not well defined corpus: everything which does not correspond to one of the declared intents.

I'm currently working on improving the documentation, and will include this in it. I hope this help. Cheers

elyase commented 6 years ago

yes, that is very helpful. thanks!