jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

Scholar contains a query which has fields not present in the schema #23

Closed DeNeutoy closed 5 years ago

DeNeutoy commented 5 years ago

https://github.com/jkkummerfeld/text2sql-data/blob/master/data/scholar.json#L8264

Neither FIELD nor PAPERFIELD are in the schema - I confirmed that this is also a bug in the original dataset:

https://github.com/sriniiyer/nl2sql/blob/master/data/scholar/scholar_train.sql#L313

DeNeutoy commented 5 years ago

Hmm actually it is present in the original schema:

https://github.com/sriniiyer/nl2sql/blob/master/data/scholar/scholar.schema#L16

How did you generate the *-schema.csv files?

DeNeutoy commented 5 years ago

This was found in the context of looking for global values in datasets which are not extracted as variables (as I need to add them to the grammar to be able to produce them). Here is what I found:

GLOBAL_DATASET_VALUES: Dict[str, List[str]] = {
        # These are used to check values are present, or numbers of authors.
        "scholar": ["0", "1", "2"],
        # 0 is used for "sea level", 750 is a "major" lake, and 150000 is a "major" city.
        "geography": ["0", "750", "150000"],
        # This defines what an "above average" restaurant is.
        "restaurants": ["2.5"]
}
jkkummerfeld commented 5 years ago

I can't remember exactly how we made the schema files now (I think we constructed them from the SQL itself, rather than trusting the metadata provided with previous data). The canonicalisation depends on the schema file though, so I'm not surprised that query is missing the standard alias structure.

Those global values are consistent with my observations.

24 should address this. Thanks for pointing it out!

DeNeutoy commented 5 years ago

Thanks!