jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

Any canonicalization for quotes? #32

Closed todpole3 closed 5 years ago

todpole3 commented 5 years ago

It looks like the "canonicaliser.py" file does not contain a procedure for normalizing quotes around values in SQL such as \"EMNLP\". For example, some quotes are removable and sometimes single quotes and double quotes are exchangeable (I'm not entirely sure that this is true).

Hence I wonder if the dataset has any canonicalization regarding this?

jkkummerfeld commented 5 years ago

The standardise_blank_spaces function converts single quotes to double quotes (see https://github.com/jkkummerfeld/text2sql-data/blob/master/tools/canonicaliser.py#L192 ), which I realise is a bit counter-intuitive given the function name...

Knowing when quotes are removable seems like a harder problem. I'm going to close this issue for now, but please reopen it if you have a suggestion on that!

todpole3 commented 5 years ago

That makes sense. Thanks.