snipsco / snips-nlu

Snips Python library to extract meaning from text
https://snips-nlu.readthedocs.io
Apache License 2.0
3.9k stars 512 forks source link

Entity extraction without necessarily knowing the the intent and/or slot name #873

Open hiddentn opened 4 years ago

hiddentn commented 4 years ago

Hello, I am trying to use snip-nlu in an unfamiliar fashion , i am trying to use it as an entity extraction tool. My reason behind this is that a have a an existing unlabeled dataset and i am trying to extract entities regardless of their type (slot name) can this be done effectively using snips-nlu ? and if so some pointer on how to do it would be a great help

An example of my current data dataset is something like this :

{
    "language": "en",
    "entities": {
        "Meal": {
            "data": [
                { "value": "Breakfast", "synonyms": ["breakfast", "break fast", "breakfasts", "pdj", "brekky", "breky", "morning meal"] },
                { "value": "Brunch", "synonyms": ["brunch", "brunchs", "brunches"] },
                { "value": "Lunch", "synonyms": ["lunch", "lunchs", "meal"] },
                { "value": "Dinner", "synonyms": ["dinner", "afterwork", "after work", "evening meal", "supper", "dinners", "dessert", "dining", "afternoon tea", "a la carte menu", "evening meals"] },
                { "value": "Table", "synonyms": ["table", "tables", "menu", "buffet", "buffets", "carvery", "food"] }
            ],
            "use_synonyms": true,
            "automatically_extensible": true,
            "matching_strictness": 0.7
        },
        "RentalItems": {
            "data": [
                { "value": "Bike", "synonyms": ["bike", "bikes", "bicycle", "bicycles", "vtt"] },
                { "value": "Electric Bike", "synonyms": ["electric bike", "electric bikes", "scooter"] },
                { "value": "Portable Wifi", "synonyms": ["portable wifi", "portable connexion", "portable wi-fi", "pocket wifi", "wifi pocket"] },
                { "value": "Umbrella", "synonyms": ["brolly", "sunshade", "umbrella"] },
                { "value": "Stroller", "synonyms": ["stroller", "strollers", "pushchairs"] },
                { "value": "Car", "synonyms": ["car", "cars", "van", "limo"] },
                { "value": "Motor Scooter", "synonyms": ["motor scooter", "buggies"] },
                { "value": "Golf", "synonyms": ["golf", "golf equipement", "green fee"] },
                { "value": "Ski", "synonyms": ["ski", "ski equipement", "the slopes", "skis"] },
                { "value": "Boat", "synonyms": ["boat", "boats", "catamaran", "dinghy", "sailboat", "yacht", "ship", "star", "flagship", "kayaks"] }
            ],
            "use_synonyms": true,
            "automatically_extensible": true,
            "matching_strictness": 0.7
        },
        "snips/amountOfMoney": {},
        "snips/number": {},
        "snips/ordinal": {}
    },
    "intents": {
        "BreakfastMenuRequest": {
            "utterances": [
                { "data": [{ "text": "and it's a smorgasbord pdj?" }] },
                { "data": [{ "text": "breakfast composition" }] },
                { "data": [{ "text": "breakfast menu" }] },
                { "data": [{ "text": "can you explain the morning meal options ?" }] },
                { "data": [{ "text": "can you please tell me more about the breakfast menu" }] },
                { "data": [{ "text": "could you let me know what you serve at the continental breky" }] },
                { "data": [{ "text": "do you do a buffet breakfast in the morning?" }] },
                { "data": [{ "text": "do you have a buffet breky?" }] },
                { "data": [{ "text": "do you have hot drink for breakfast?" }] },
                { "data": [{ "text": "do you make breakfasts?" }] },
                { "data": [{ "text": "do you serve pancakes at brekky" }] },
                { "data": [{ "text": "does the hotel have a brekky menu?" }] },
                { "data": [{ "text": "i would just like to know what type of early meal your hotel provides?" }] },
                { "data": [{ "text": "i'm intrested in the american breakfast" }] },
                { "data": [{ "text": "is it a full breakfast?" }] },
                { "data": [{ "text": "is there hot drink in breakfast?" }] },
                { "data": [{ "text": "menu of foods served in breakfast?" }] },
                { "data": [{ "text": "what does the early meal consist off?" }] },
                { "data": [{ "text": "what does the early meal includes?" }] },
                { "data": [{ "text": "what does your breakfast consist of?" }] },
                { "data": [{ "text": "what foods are served in breakfast?" }] },
                { "data": [{ "text": "what is the breakfast" }] },
                { "data": [{ "text": "what is the breakfast menu?" }] },
                { "data": [{ "text": "what kind of breakfast do you offer?" }] },
                { "data": [{ "text": "what options do you have at menu breakfast?" }] },
                { "data": [{ "text": "what the menu if i want to serve breky on my room ?" }] },
                { "data": [{ "text": "what type of pdj if i booked together with the hotels" }] },
                { "data": [{ "text": "what's at the buffet?" }] },
                { "data": [{ "text": "what's in the breakfast ?" }] }
            ]
        },
        "RentalsRequest": {
            "utterances": [
                { "data": [{ "text": "am i able to rent a car at reception ?" }] },
                { "data": [{ "text": "are we able to rent bikes at the hotel" }] },
                { "data": [{ "text": "can the hotel book me a rental car for part of my stay?" }] },
                { "data": [{ "text": "can they recommend a place where i can rent a car?" }] },
                { "data": [{ "text": "can we borrow stroller for child age 4" }] },
                { "data": [{ "text": "can you provide me with pocket wifi" }] },
                { "data": [{ "text": "can you provide me with ski equipment" }] },
                { "data": [{ "text": "can you provide me with umbrella if it rains" }] },
                { "data": [{ "text": "can you tell me where is the closest place to rent a car?" }] },
                { "data": [{ "text": "do the hotels have pushchairs we can hire for our child" }] },
                { "data": [{ "text": "do you have a stroller" }] },
                { "data": [{ "text": "do you have bikes?" }] },
                { "data": [{ "text": "do you have rental service at the hotel ?" }] },
                { "data": [{ "text": "does the hotel help guest arrange rental cars?" }] },
                { "data": [{ "text": "does the hotel provide umbrellas" }] },
                { "data": [{ "text": "have you ski equipment" }] },
                { "data": [{ "text": "hire a car" }] },
                { "data": [{ "text": "how much are the bikes to rent?" }] },
                { "data": [{ "text": "how much to hire a buggy" }] },
                { "data": [{ "text": "i m looking for limo" }] },
                { "data": [{ "text": "i need a bike" }] },
                { "data": [{ "text": "i need to kmow if there is a rental car near the hotel" }] },
                { "data": [{ "text": "i need to rent an umbrella" }] },
                { "data": [{ "text": "i would like to rent skis" }] },
                { "data": [{ "text": "nedd to hire a car" }] },
                { "data": [{ "text": "rent at reception ?" }] },
                { "data": [{ "text": "we are interested in kayaks to explore the cricks around" }] },
                { "data": [{ "text": "we would like to rent a car for one day what rental office do you recommed?" }] },
                { "data": [{ "text": "we're looking for information about your rentals." }] },
                { "data": [{ "text": "what is the nearest car rental agency to the hotel" }] },
                { "data": [{ "text": "where i can hire a mobility scooter for my stay" }] }
            ]
        }
    }
}
ushmau5 commented 4 years ago

If you trained an intent using all of those slots you could try doing slot detection in a loop passing each of the slot names to parse_slots and concatenate the results.