salesforce / WikiSQL

A large annotated semantic parsing corpus for developing natural language interfaces.
BSD 3-Clause "New" or "Revised" License
1.62k stars 322 forks source link

[None] gold questions #28

Open cristipp opened 5 years ago

cristipp commented 5 years ago

I've instrumented evaluate.py to count the number of gold questions that have a gold answer of [None]. In the dev set, evaluated via the docker instructions on the front page, there are 657 / 8421 = 8% questions with a [None] result. I've manually looked at a few such questions, and they seem to have various errors, most notably lacking space before commas and > instead of >=.

Is this a known issue?

{'phase': 2, 'table_id': '2-12955969-1', 'question': 'What is the year of the tournament played at Melbourne, Australia?', 'sql': {'sel': 0, 'conds': [[2, 0, 'melbourne, australia']], 'agg': 5}}
SELECT AVG(col0) AS result FROM table_2_12955969_1 WHERE col2 = :col2
{'col2': 'melbourne, australia'}
Manual fix: SELECT AVG(col0) AS result FROM table_2_12955969_1 WHERE col2 = 'melbourne , australia';

{'phase': 2, 'table_id': '2-12312050-1', 'question': "What's the sum of points for the 1963 season when there are more than 30 games?", 'sql': {'sel': 4, 'conds': [[0, 0, '1963'], [2, 1, 30]], 'agg': 4}}
SELECT SUM(col4) AS result FROM table_2_12312050_1 WHERE col0 = :col0 AND col2 > :col2
{'col0': '1963', 'col2': 30}
Manual fix: SELECT SUM(col4) AS result FROM table_2_12312050_1 WHERE col0 = '1963' AND col2 >= 30;