Closed elyase closed 6 years ago
Hi @elyase, you are right, it is not well documented at the moment.
The IntentClassifierDataAugmentationConfig
is responsible for configuring how data is augmented before training the intent classifier. More specifically, it contains this attribute:
noise_factor (int, optional): Defines the size of the noise to
generate to train the implicit *None* intent, as a multiplier of
the average size of the other intents. Default is 5.
We then generate None
utterances, by randomly sampling words from text files such as this one. The words in these files have been samples based on their global frequency in the corresponding language. Hence, it approximates the words distribution for the language.
This solution is probably not the best one overall, but it has been good enough so far.
Generating utterances for the None
intent is inherently tricky as it consists in modelling a very large and not well defined corpus: everything which does not correspond to one of the declared intents.
I'm currently working on improving the documentation, and will include this in it. I hope this help. Cheers
yes, that is very helpful. thanks!
Is the out of scope strategy documented somewhere? I seem to have read something about randomly generated utterances for the
None
class but it would be great to have some details.