taoyds / spider

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
https://yale-lily.github.io/spider
Apache License 2.0
812 stars 193 forks source link

Confuse about Group By Labeling #21

Closed zhanzecheng closed 5 years ago

zhanzecheng commented 5 years ago

Hi Tao,

Confuse about some group by label, show as below

1.Show the names of singers that have more than one song.
SELECT T1.Name FROM singer AS T1 JOIN song AS T2 ON T1.Singer_ID  =  T2.Singer_ID GROUP BY T1.Name HAVING COUNT(*)  >  1

2.What are the names of the singers that have more than one songs?
SELECT T1.Name FROM singer AS T1 JOIN song AS T2 ON T1.Singer_ID  =  T2.Singer_ID GROUP BY T1.Name HAVING COUNT(*)  >  1

3.find the name of employee who was awarded the most times in the evaluation.
SELECT t1.name FROM employee AS t1 JOIN evaluation AS t2 ON t1.Employee_ID  =  t2.Employee_ID GROUP BY t2.Employee_ID ORDER BY count(*) DESC LIMIT 1

4.Which employee received the most awards in evaluations? Give me the employee name.
SELECT t1.name FROM employee AS t1 JOIN evaluation AS t2 ON t1.Employee_ID  =  t2.Employee_ID GROUP BY t2.Employee_ID ORDER BY count(*) DESC LIMIT 1
​

Could you explain why 1 and 2 are group by name but 3 and 4 and group by primary key?

Thanks a lot, Zecheng

taoyds commented 5 years ago

Hi Zecheng,

These questions are a little bit confusing if there are some singers with the exact same name. In most of the cases, we group the result by primary key. We require our annotators to select the column name which the SQL groups by. They might group by the name instead of the primary key if the question asking for the names.

Best, Tao

GaoyanCheerup commented 5 years ago

Hi, @taoyds , may I ask when will group the result by primary key, and when will group the result by the selected column name? From the training data, it seems hard to find pattern. That's maybe the reason why Group By clause (no having) has a low accuracy.