microsoft / IRNet

An algorithm for cross-domain NL2SQL
MIT License
264 stars 81 forks source link

Error occured on parseing SQL to semQL,like "SELECT COUNT(*) FROM ( SELECT T1.CountryId , ....) " #11

Open xiaer1 opened 4 years ago

xiaer1 commented 4 years ago

Hi. When I parse SQL to semQL(sql2semQL.py), the SQL like"SELECT COUNT(*) FROM ( SELECT ... )",the rule_label is:

Root1(3) Root(5) Sel(0) N(0) A(3) C(0) T({'except': None, 'from': {'conds': [[False, 2, [0, [0, 3, False], None], [0, 9, False], None]], 'table_units': [['table_unit', 1], ['table_unit', 2]]}, 'groupBy': [[0, 3, False]], 'having': [[False, 3, [0, [3, 0, False], None], 2.0, None]], 'intersect': None, 'limit': None, 'orderBy': [], 'select': [False, [[0, [0, [0, 3, False], None]], [3, [0, [0, 0, False], None]]]], 'union': None, 'where': []})

So, I downloaded your preprocessed data, it seemed as above.

fafa92 commented 4 years ago

Are you getting this error "column * table error" in the console too?

xiaer1 commented 4 years ago

No, it excutes the following code.

        if len(sql['sql']['from']['table_units']) == 1:
             return T(sql['sql']['from']['table_units'][0][1])

I found the same for your preprocessed data.

fafa92 commented 4 years ago

Yeah I backtracked that problem to that piece of code, but not sure why those queries are causing that problem. How did you add your own data to the dataset? did you add it to your validation file?

xiaer1 commented 4 years ago

I don't add my own data to the dataset,these examples are in the dev.json like select from (select ...),you can also find these error rule_label with tips of searching FROM ( string on your preprocessed dev.json. These examples have something in common on from (select ...) that it belongs to Root1 type instead of clear or inferred table. Its corresponding structs of from -> table_units as following,but the code doesn't dealt with this condition.

"from": {"conds": [], "table_units": [["sql", {"except": null, "from": {"conds": [[false, 2, [0, [0, 3, false], null], [0, 9, false], null]], "table_units": [["table_unit", 1], ["table_unit", 2]]}, "groupBy": [[0, 3, false]], "having": [[false, 3, [0, [3, 0, false], null], 2.0, null]], "intersect": null, "limit": null, "orderBy": [], "select": [false, [[0, [0, [0, 3, false], null]], [3, [0, [0, 0, false], null]]]], "union": null, "where": []}]]}
fafa92 commented 4 years ago

Got it, thanks for your explanation.

Another question that I had was more about how we can extend this project for our project, so for example if I have a database that I want to do query on it, should I add my dataset to tables.json and add my sql questions to dev.json?

jaydeepb-inexture commented 4 years ago

Got it, thanks for your explanation.

Another question that I had was more about how we can extend this project for our project, so for example if I have a database that I want to do query on it, should I add my dataset to tables.json and add my sql questions to dev.json?

have you got any clue on this? do we need to convert our database schema as of train.json and question as in dev.json

madcpt commented 4 years ago

No, it excutes the following code.

        if len(sql['sql']['from']['table_units']) == 1:
             return T(sql['sql']['from']['table_units'][0][1])

I found the same for your preprocessed data.

Hey I got the same problem and this is exactly where the error occurs! Any clue on how to solve this problem?

anshudaur commented 4 years ago

HI All,

I am currently trying to preprocess the provided basic spider data. After executing the data_preprocess.py and sql2SemQL.py files, the file size is 20,819KB which is less than the actual processed file (39,030KB) provided by the authors. I also get the same error message ("column * table error" ) on the column, for 72 queries.

Do you have any pointers on why do we get this error for 72 queries?

Thanks, Anshu

isjwtang commented 2 years ago

I also get the same result. The reason for the error is that the original query is to get a new table first, and then the outermost SQL statement: SELECT COUNT(*) FROM queries the new table.


question: How many countries speak both English and Dutch?

sql: SELECT COUNT(*) FROM (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "Dutch"

semql: Root1(3) Root(5) Sel(0) N(0) A(3) C(0) T({'from': {'table_units': [['table_unit', 2], ['table_unit', 3]], 'conds': [[False, 2, [0, [0, 8, False], None], [0, 23, False], None]]}, 'select': [False, [[0, [0, [0, 9, False], None]]]], 'where': [[False, 2, [0, [0, 24, False], None], '"English"', None]], 'groupBy': [], 'having': [], 'orderBy': [], 'limit': None, 'intersect': {'from': {'table_units': [['table_unit', 2], ['table_unit', 3]], 'conds': [[False, 2, [0, [0, 8, False], None], [0, 23, False], None]]}, 'select': [False, [[0, [0, [0, 9, False], None]]]], 'where': [[False, 2, [0, [0, 24, False], None], '"Dutch"', None]], 'groupBy': [], 'having': [], 'orderBy': [], 'limit': None, 'intersect': None, 'union': None, 'except': None}, 'union': None, 'except': None})