jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

This is more of a question than issue. #47

Closed aerinkim closed 4 years ago

aerinkim commented 4 years ago

Hi, thank you for releasing the dataset.

What does 'query-split' and 'question-split' mean, in the dataset? For example, in the last entry of restaurant dataset,

where can i find a name0 in city_name0 ?
where is name0 in city_name0 ?
where is a name0 in city_name0 ?

are duplicated 3 times but their question-split are different. Any pointer would be appreciated. (Can't find it on the paper.)

jkkummerfeld commented 4 years ago

Hi, Thanks for your interest! The question split was formed by taking the set of (question, SQL query) pairs and randomly dividing it. The query split was formed by taking the set of queries and randomly dividing them (that means all questions for a given query are in the same set). See figure 1 in the paper for an example to get some intuition and for full details see the start of section 5.

Jonathan

aerinkim commented 4 years ago

Thank you so much, now it makes sense.