snipsco / snips-nlu

Snips Python library to extract meaning from text
https://snips-nlu.readthedocs.io
Apache License 2.0
3.9k stars 513 forks source link

I am trying to learn a new intent #660

Open samarth12 opened 6 years ago

samarth12 commented 6 years ago

Is there a way I can iteratively update the snips dataset if I want the user to add new intents to the system if something is classified as None?

adrienball commented 6 years ago

Hey @samarth12 , Can you elaborate a bit ? The dataset is just a JSON file, so you can add new intents by adding new key/value pairs inside the "intents" dictionary.

samarth12 commented 6 years ago

So currently based on the training dataset, the classifier gives out the classified result or else it just gives a None intent. Now what I am looking to do is if the output is "None" for any new user input sentence:

  1. First, prompt the user to tag the input to any of the existing intents.
  2. Else, if the user thinks it doesn't match any existing intents then the user gets to define a new intent for that particular input sentence.

Now for both steps 1 and 2, I need to update the dataset with this new input sentence and its corresponding intent in the exact same format as the input dataset in order to retrain the model with updated dataset over time.

I would love to know your thoughts on that @adrienball, please let me know if you still have questions about it.

adrienball commented 6 years ago

Hey @samarth12 , That is a reasonable use case indeed. The Snips NLU library will not handle this logic for you though, so, as you said, you will have to write a script to convert the user input into the format used in Snips NLU's dataset.

Cheers, Adrien

samarth12 commented 6 years ago

So my next question is if I try adding new intents to the dataset using key-value pairs, how do I exactly break the single input and its corresponding intent into the exact format. Like, get the accurate slot name and its corresponding entity as well like the it is done in the dataset. Because I have tried extracting entities from my new inputs using external APIs and then push them in the dataset, that only results in a very bad accuracy!

Again I am not sure if I was able to explain it as well as I wanted to, but thanks for your feedback @adrienball

adrienball commented 6 years ago

From my understanding of what you are trying to achieve, that's the user who should provide the labelling (both intent and slots) of new inputs that are not recognized. Only the user knows what his input corresponds to, so only him can specify the intent and the slots. Or maybe I'm missing something?

samarth12 commented 6 years ago

Okay so the user just provides the right intent (which they think is right) for the new input sentence, the entity and slot should be automatically identified and added to the dataset in the correct format for the corresponding input. How is SNIPS doing that while generating the dataset initially?

Eg: Book me a ticket from LA to NYC.

Intent: BookTicket (User defines this) Entity: location Slot name: departure (LA) and destination (NYC)

Now is there a model that just extracts and defines the entity and slot name (which seems kind of hard), or does that need to be done manually by the user?

Thank you so much for being responsive @adrienball !

adrienball commented 6 years ago

@samarth12

Okay so the user just provides the right intent (which they think is right) for the new input sentence, the entity and slot should be automatically identified and added to the dataset in the correct format for the corresponding input. How is SNIPS doing that while generating the dataset initially?

The entity and slot should not be automatically identified in that case. If the user input is not recognized by the NLU, it's likely that it contains atypical data which needs to be completely labeled, i.e. the user must provide both the intent and the slots. This is quite clear in the case where the input corresponds to a new intent, how could you possibly know the slots you are looking for in this input ?

In your example, "Book me a ticket from LA to NYC.", that means the user should provide everything below, and not only the first line about the intent:

Intent: BookTicket
Slot 1: 
    - slot name: departure
    - entity: location
    - value: LA
Slot 2:
    - slot name: destination
    - entity: location
    - value: NYC
samarth12 commented 6 years ago

Yes. thank you! That is something I have been wondering about for a while. But do you think there is any API out there capable of detecting entities as well as their slots based on a pre trained engine that might be helpful, although I know it doesn't make too much sense. Like I pick the entity and the slots from that API, take the intent from the user and append all of that to the SNIPS NLU dataset.

All of it sounds pretty vague but I think you do get the idea of what I am trying to achieve with this, what would you suggest? Because asking the users for the slots and entities might not work on a large scale (too cumbersome), people might just choose to not go through with the process instead. Do you have any other ideas or thoughts on this @adrienball ?

Thanks again for helping me out!