jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

Correct way of handling the data split #34

Closed todpole3 closed 5 years ago

todpole3 commented 5 years ago

I would like to make sure I'm using correct dataset handling to obtain the results in Table 3 of the paper. Would you please help clarify the following questions?

Based on my understanding of the paper, Table 3 reports test set results over all datasets.

For Advising, Geo, Scholar and ATIS the datasets were split into train/dev/test. When testing the model on test set, shall we train the models using only the train set or the train + dev set combined?

For Academic, IMDB, Restaurants, Yelp there are only cross validation split in the data release. Hence are the numbers over these dataset in Table 3 cross-validation numbers? If not, how are the test set defined for these datasets?

Thanks for your attention.

jkkummerfeld commented 5 years ago

For the first set, we used train+dev. For the second set, we used cross-validation.

jkkummerfeld commented 5 years ago

I've added this to the Data page (http://jkk.name/text2sql-data/data/), so I'll close this issue.

todpole3 commented 5 years ago

Thanks for the clarification!