taoyds / spider

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
https://yale-lily.github.io/spider
Apache License 2.0
812 stars 193 forks source link

wrongly parsed samples in released data from spider website #6

Closed BladeSun closed 5 years ago

BladeSun commented 5 years ago

Hi Tao, In the released dataset, queries with more than two "table_units" are not parsed correctly. For example: In dev.json line 7431. "query": "SELECT count(*) FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T2.petid = T3.petid WHERE T1.sex = 'F' AND T3.pettype = 'dog'" The 'from' part of the above query is parsed to: "from": {"conds": [ [ false, 2, [ 0, [ 0, 1, false ], null ], [ 0, 9, false ], null ] ], "table_units": [ [ "table_unit", 0 ], [ "table_unit", 1 ] ] } Meanwhile, the script named parse_sql_one.py would give the correct parse: 'from': {'table_units': [('table_unit', 0), ('table_unit', 1), ('table_unit', 2)], 'conds': [(False, 2, (0, (0, 1, False), None), (0, 9, False), None), 'and', (False, 2, (0, (0, 10, False), None), (0, 11, False), None)]}

I wonder if you have done the baseline experiments using the released dataset.

Thanks, Yibo

BladeSun commented 5 years ago

Sorry, for repost this issue. I've seen answers in #3 .