Closed jkkummerfeld closed 5 years ago
Any updates on this thread?
Not yet. We won't be able to get to it until after the NAACL deadline (December 10th) at least - too many other projects keeping us busy!
Note also that we will not be able to release their test set (since it is kept private), so it will remain necessary to run using their evaluation system.
This is now in the data development branch:
https://github.com/jkkummerfeld/text2sql-data/tree/data-development
Please let us know if you spot any issues with the conversion!
do you mind uploading the modified canonicaliser for spider?
Done, it's in the tools directory. The changes are somewhat hacky, but get the job done.
looks like a lot of desc limit 1
s got anonymized to desc limit var0
. Is this by design to account for other limit numbers? It only seems to have happened for the spider dataset.
Thanks! That was an oversight in variable identification. For spider I did it automatically, making all numbers variables, whereas in the others I did it partially manually (since there were a small enough number of queries to make checking feasible).
I've pushed a commit that fixes this.
Another quick question-- in your canonicalisation, if a column name matches its table name, should it also receive an alias?
e.g.
SELECT institution , LOCATION FROM institution WHERE founded > 1990 AND TYPE = 'Private'
became
SELECT INSTITUTIONalias1.INSTITUTION AS INSTITUTIONalias0 , INSTITUTIONalias1.LOCATION FROM INSTITUTION AS INSTITUTIONalias1 WHERE INSTITUTIONalias1.FOUNDED > var0 AND INSTITUTIONalias1.TYPE = \"var1\" ;
No, that's also a bug, now fixed.
Looking at the previous datasets, it was extremely rare to have a table and field with the same name, and they were often in rarely used parts of the schema.
also looks like the wrong aliases are chosen at times-- table names are identified as other tables' columns with the same name.
SELECT name FROM physician EXCEPT SELECT T2.name FROM appointment AS T1 JOIN physician AS T2 ON T1.Physician = T2.EmployeeID
SELECT PHYSICIANalias0.NAME FROM APPOINTMENTalias0.PHYSICIAN AS PHYSICIANalias0 SELECT PHYSICIANalias1.NAME FROM APPOINTMENT AS APPOINTMENTalias0 JOIN PHYSICIAN AS PHYSICIANalias1 ON PHYSICIANalias1.EMPLOYEEID = APPOINTMENTalias0.PHYSICIAN ;
Hm, that's going to be trickier to resolve. I'll try to investigate it sometime this week.
@nweir127 it took a little longer to get to than expected, but I think I've resolved this (committed to data-development in e76b98b1a79b7a28ec5df0e895e4c9e6ffe9cb84 ).
The problem was that we didn't handle EXCEPT SELECT and INTERSECT SELECT correctly (not something that has come up in other datasets).
Closing as we have now merged in the branch that contains Spider.
Modifying data from https://yale-lily.github.io/spider to be in our format.