jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

Fix bug canonicalizing queries not ending in ';' #31

Closed todpole3 closed 5 years ago

todpole3 commented 5 years ago

The current loop missed the last chunk

jkkummerfeld commented 5 years ago

This shouldn't be required as the first stage of canonicalisation adds a semicolon if one is not present.

Do you have an example test case that was failing? There may be a deeper issue.

Thanks!

todpole3 commented 5 years ago

This is the problem -- the order_sequence function assumes that the first stage of canonicalization ensures the query ends in a semicolon, but in line 618, add_semicolon was implemented as an optional stage.

The bug can be reproduced by changing line 1009 to

canonical = make_canonical(query, schema, variables, skip={"add_semicolon"})

The error message looks something like this

Traceback (most recent call last):
  File "moz_sql_parser/canonicaliser.py", line 1009 in <module>
    canonical = make_canonical(query, schema, variables, skip={'add_semicolon'})
  File "moz_sql_parser/canonicaliser.py", line 627, in make_canonical
    query = order_query(query, variables)
  File "moz_sql_parser/canonicaliser.py", line 614, in order_query
    order_sequence(tokens, 0, len(tokens)-1, variables)
  File "moz_sql_parser/canonicaliser.py", line 526, in order_sequence
    right = tokens_for_chunk(tokens, chunks[next_equals + 1])
IndexError: list index out of range

It suggests something remotely related to missing semicolon and it took me a while to trace to line 500.

If you prefer not changing the order_sequence function, I suggest making add_semicolon a mandatory procedure; otherwise I think this pull request solves the problem. Thanks!

jkkummerfeld commented 5 years ago

Thanks for the detailed follow-up. I've edited the code (in master) to make the add_semicolon function required since it is really assumed throughout.