taoyds / spider

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
https://yale-lily.github.io/spider
Apache License 2.0
848 stars 193 forks source link

Inconsistent evaluations of FROM clause in subquery #22

Closed JasperGuo closed 5 years ago

JasperGuo commented 5 years ago

Hi,

Again, thanks for your great job in building such a dataset for nl2sql. During evaluation, I found an inconsistent evaluations on FROM clause.

1. SELECT T2.company_name FROM movie AS T1 JOIN culture_company AS T2 ON T1.movie_id  =  T2.movie_id WHERE T1.year  =  1999
2. SELECT T1.company_name FROM culture_company AS T1 JOIN movie AS T2 ON T1.movie_id  =  T2.movie_id WHERE T2.year  =  1999
3. SELECT T2.company_name FROM movie AS T1 JOIN culture_company AS T2 WHERE T1.year = 1999
4. SELECT DISTINCT T2.company_name FROM movie AS T1 JOIN culture_company AS T2 WHERE T1.year = 1999

The four sqls above are equivalent based on published evalution.py.

However, when it comes to FROM Clause in subquery (query in WHERE Clause), inconsistency occurs.

1. SELECT mID ,  avg(stars) FROM Rating WHERE mID NOT IN (SELECT T1.mID FROM Rating AS T1 JOIN Reviewer AS T2 ON T1.rID  =  T2.rID WHERE T2.name  =  "Brittany Harris") GROUP BY mID
2. SELECT mID ,  avg(stars) FROM Rating WHERE mID NOT IN (SELECT T2.mID FROM Reviewer AS T1 JOIN Rating AS T2 ON T1.rID  =  T2.rID WHERE T1.name  =  "Brittany Harris") GROUP BY mID
3. SELECT mID ,  avg(stars) FROM Rating WHERE mID NOT IN (SELECT T1.mID FROM Rating AS T1 JOIN Reviewer AS T2  WHERE T2.name  =  "Brittany Harris") GROUP BY mID
4. SELECT mID ,  avg(stars) FROM Rating WHERE mID NOT IN (SELECT DISTINCT T1.mID FROM Rating AS T1 JOIN Reviewer AS T2 ON T1.rID  =  T2.rID WHERE T2.name  =  "Brittany Harris") GROUP BY mID

These four sqls are reported as inequivalence.

I am quite confused about this inconsistency. Is there any concern about the FROM Clause in subquery ?

Followings are details for inconsistency reproduction.

db_id: culture_company
Question: What are all company names that have a corresponding movie directed in the year 1999?
Query: SELECT T1.Company_name FROM culture_company AS T1 JOIN movie AS T2 WHERE T2.Year = 1

db_id: movie_1
Question: Find the average rating star for each movie that are not reviewed by Brittany Harris.
Query: SELECT mID ,  avg(stars) FROM Rating WHERE mID NOT IN (SELECT T1.mID FROM Rating AS T1 JOIN Reviewer AS T2 ON T1.rID  =  T2.rID WHERE T2.name  =  "Brittany Harris") GROUP BY mID

Best, Jasper

taoyds commented 5 years ago

Hi, Jasper,

Thanks for your report. The first four examples are supposed to be the same (reminder: our evaluation script doesn't evaluate DISTINCT keyword). For the FROM Clause, it is going to take too much time to fix this bug. Also, considering this case doesn't happen often, we decided not to update the evaluation script for now.

Best, Tao