taoyds / spider

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
https://yale-lily.github.io/spider
Apache License 2.0
801 stars 193 forks source link

Some answers in evaluation_examples/dev.sql are incorrect #95

Open zhuyuqing opened 11 months ago

zhuyuqing commented 11 months ago

There are 4 of them.

The correct SQL for Question 17 should be "select avg(capacity) , max(capacity) from stadium", as obviously the average should refer to the average capacity. And, the average column of the stadium table should refer to the average attendance, as indicated by the answer to Question 19.

Question 17: What is the maximum capacity and the average of all stadiums ? ||| concert_singer SQL: select max(capacity), average from stadium

Question 18: What is the average and maximum capacities for all stadiums ? ||| concert_singer SQL: select avg(capacity) , max(capacity) from stadium

Question 19: What is the name and capacity for the stadium with highest average attendance ? ||| concert_singer SQL: select name , capacity from stadium order by average desc limit 1

========

The correct SQL for Question 66 should be all distinct students : "select distinct t1.fname, t1.age from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t3.petid = t2.petid where t3.pettype = 'dog' and t1.stuid not in (select t1.stuid from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t3.petid = t2.petid where t3.pettype = 'cat')"

Question 66: Find the first name and age of students who have a dog but do not have a cat as a pet . ||| pets_1 SQL: select t1.fname, t1.age from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t3.petid = t2.petid where t3.pettype = 'dog' and t1.stuid not in (select t1.stuid from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t3.petid = t2.petid where t3.pettype = 'cat')

========

The correct SQL for Question 67 should only return one column: "select t1.fname from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t3.petid = t2.petid where t3.pettype = 'dog' and t1.stuid not in (select t1.stuid from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t3.petid = t2.petid where t3.pettype = 'cat')"

Question 67: What is the first name of every student who has a dog but does not have a cat ? ||| pets_1 SQL: select t1.fname , t1.age from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t3.petid = t2.petid where t3.pettype = 'dog' and t1.stuid not in (select t1.stuid from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t3.petid = t2.petid where t3.pettype = 'cat')

========

The correct SQL for Question 101 should only return the FullName: "select distinct t1.fullname from car_makers as t1 join model_list as t2 on t1.id = t2.maker join car_names as t3 on t2.model = t3.model join cars_data as t4 on t3.makeid = t4.id where t4.year = '1970';" Otherwise, Question 151 should be changed accordingly.

Question 101: What is the name of the different car makers who produced a car in 1970 ? ||| car_1 SQL: select distinct t1.maker from car_makers as t1 join model_list as t2 on t1.id = t2.maker join car_names as t3 on t2.model = t3.model join cars_data as t4 on t3.makeid = t4.id where t4.year = '1970';

Question 151: What are the names and ids of all makers with more than 3 models ? ||| car_1 SQL: select t1.fullname , t1.id from car_makers as t1 join model_list as t2 on t1.id = t2.maker group by t1.id having count(*) > 3;

BugMaker-Boyan commented 11 months ago

yes, I meet the same issue in dev data.

jitingxu1 commented 6 months ago
#pip install ibis-framework

import ibis
pets = ibis.connect("./spider/database/pets_1/pets_1.sqlite")
pets.list_tables()
['Has_Pet', 'Pets', 'Student']
p = pets.table('Pets')
p.head()
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ PetID ┃ PetType ┃ pet_age ┃ weight  ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int32 │ string  │ int32   │ float64 │
├───────┼─────────┼─────────┼─────────┤
│  2001 │ cat     │       3 │    12.0 │
│  2002 │ dog     │       2 │    13.4 │
│  2003 │ dog     │       1 │     9.3 │
└───────┴─────────┴─────────┴─────────┘

This query is wrong for this question:

{
        "db_id": "pets_1",
        "query": "SELECT weight FROM pets ORDER BY pet_age LIMIT 1",
        "question": "How much does the youngest dog weigh?",
}