jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

Sqlite3 could not execute *-db.sql #48

Closed songhe17 closed 3 years ago

songhe17 commented 4 years ago

I wanted to run atis-db.sql in python with sqlite3 to create the db file but failed due to syntax errors.

jkkummerfeld commented 4 years ago

All of the databases were constructed in MySQL and exported to text files. I don't know the differences in syntax between MySQL and sqlite3, so I'm not sure how to make it work there. We'd be happy to accept a pull request with sqlite3 versions of the databases!

Here is the information about how we exported the databases (from the atis sql file):

-- MySQL dump 10.13  Distrib 5.7.17, for Linux (x86_64)
--
-- Host: localhost    Database: atis
-- ------------------------------------------------------
-- Server version       5.7.17
rizar commented 3 years ago

There is a script for converting MySQL dumps to SQLite dumps, and it kind of works:

https://gist.githubusercontent.com/esperlu/943776/raw/be469f0a0ab8962350f3c5ebe8459218b915f817/mysql2sqlite.sh

jkkummerfeld commented 3 years ago

Using that script I've added sqlite versions of the data.

rizar commented 3 years ago

Thanks! (and also thanks for the great initiative to collect all lang2sql datasets in one place, by the way!). But for the record I should note there are also big databases (Yelp, Scholar, etc.) that you do not include in your repository, which need to be manually converted. The script that I mentioned above gives reasonable results.

jkkummerfeld commented 3 years ago

Good to know. There is also one (restaurants) that is not in any SQL format.

I should also note, a few new datasets have been released in this space since we put this together:

And some additional work on evaluation:

rizar commented 3 years ago

Thank you, Jonathan. I was aware of these links, except for the last one (which is very interesting, by the way).

In fact what I'm trying to do now is evaluating a SPIDER-trained RAT-SQL-style model using the methodology proposed by Suhr et al (which is also the penultimate link in your list). A key challenge is figuring out the proper primary and foreign key relations to feed into the model. We will do an open-source release all is done.

jkkummerfeld commented 3 years ago

Cool - looking forward to it!