jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

Interaction history in ATIS dataset #16

Closed kl2806 closed 6 years ago

kl2806 commented 6 years ago

Many of the utterances in ATIS highly depends on the history of the interaction, and one of the core challenges is figuring how how to reason about this context. For example, the utterance "which ones arrive at 7pm" requires the model to resolve references to previous utterances (What does "ones" refer to?).

{
        "comments": [],
        "old-name": "",
        "query-split": "train",
        "sentences": [
            {
                "text": "list all the flights that arrive at airport_code0 from various cities",
                "question-split": "train",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "what flights from any city land at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "show me the flights into airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "show me the flights arriving at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "list all the flights that arrive at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "list all the arriving flights at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "what flights land at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "show me the flights to airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "list all the landings at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "show me the flights into airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "list all the landings at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "list all flights arriving at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "show me the flights arriving at airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "list all the flights that fly into airport_code0",
                "question-split": "dev",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "list all the flights that arrive at airport_code0 airport",
                "question-split": "dev",
                "variables": {
                    "airport_code0": "MKE"
                }
            },
            {
                "text": "show me all flights arriving at airport_code0 from other airports",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "show me the flights from all airports to airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "show me the flights arriving at airport_code0 from all other airports",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "show me flights from all airports to airport_code0",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "show me the flights arriving at airport_code0 from other airports",
                "question-split": "train",
                "variables": {
                    "airport_code0": "DAL"
                }
            },
            {
                "text": "show me the flights to airport_code0 from all other airports",
                "question-split": "dev",
                "variables": {
                    "airport_code0": "DAL"
                }
            }
        ],
        "sql": [
            "SELECT DISTINCT FLIGHTalias0.FLIGHT_ID FROM AIRPORT AS AIRPORTalias0 , AIRPORT_SERVICE AS AIRPORT_SERVICEalias0 , CITY AS CITYalias0 , FLIGHT AS FLIGHTalias0 WHERE AIRPORTalias0.AIRPORT_CODE = \"airport_code0\" AND CITYalias0.CITY_CODE = AIRPORT_SERVICEalias0.CITY_CODE AND FLIGHTalias0.FROM_AIRPORT = AIRPORT_SERVICEalias0.AIRPORT_CODE AND FLIGHTalias0.TO_AIRPORT = AIRPORTalias0.AIRPORT_CODE ;",
            "SELECT DISTINCT FLIGHTalias0.FLIGHT_ID FROM AIRPORT AS AIRPORTalias0 , FLIGHT AS FLIGHTalias0 WHERE AIRPORTalias0.AIRPORT_CODE = \"airport_code0\" AND FLIGHTalias0.TO_AIRPORT = AIRPORTalias0.AIRPORT_CODE ;",
            "SELECT DISTINCT FLIGHTalias0.FLIGHT_ID FROM AIRPORT AS AIRPORTalias0 , AIRPORT AS AIRPORTalias1 , FLIGHT AS FLIGHTalias0 WHERE AIRPORTalias1.AIRPORT_CODE = \"airport_code0\" AND FLIGHTalias0.FROM_AIRPORT = AIRPORTalias0.AIRPORT_CODE AND FLIGHTalias0.TO_AIRPORT = AIRPORTalias1.AIRPORT_CODE ;"
        ],
        "variables": [
            {
                "example": "MKE",
                "location": "unk",
                "name": "airport_code0",
                "type": "airport_code"
            }
        ]
    }

Looking at the dataset, it seems like 1) the utterances do not have these kind of references, eg "ones" does not appear in the dataset and 2) there isn't any information on which interaction the utterances come from since they are grouped by SQL query. Were there any modifications to the original utterances and is there any way to reconstitute the structure of the original interactions from the data? Thanks!

jkkummerfeld commented 6 years ago

The original dataset from the 90s marked queries as requiring context or not and our data is derived from work at UW, which only used the examples that should not have required context. Unfortunately, there are some mistakes in the data that mean cases that require context are in our dataset (finding them and removing them or editing them is on the list of known issues in this repository).

Getting from our data back to the original should be possible by using approximate string matching (exact match won't work because we made some fixes to sentences, and because our variable extraction removed some variability, e.g. instead of sometimes having 'american airlines' and sometimes having 'aa', we always have 'aa').

If you are specifically interested in contextual SQL, check our Alane Suhr's work - https://github.com/clic-lab/atis and http://alanesuhr.com/atis.pdf