jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

Setting variable names to be type-based #27

Closed jkkummerfeld closed 5 years ago

jkkummerfeld commented 5 years ago

GeoQuery variables were just 'varN', this changes them to specify the type and be of the form 'typeN' instead.

jkkummerfeld commented 5 years ago

This involved:

(1) Manually changing the types to match database fields. (2) Running this code to propagate those changes:


import json

data = json.load(open('geography.json'))

for query in data:
    done = {}
    for variable in query['variables']:
        cname = variable['name']
        vtype = variable['type']
        num = done.get(vtype, -1) + 1
        done[vtype] = num
        nname = vtype + str(num)

        variable['name'] = nname
        for i, sql in enumerate(query['sql']):
            query['sql'][i] = nname.join(sql.split(cname))
        for sentence in query['sentences']:
            sentence['variables'][nname] = sentence['variables'][cname]
            sentence['variables'].pop(cname)
            sentence['text'] = nname.join(sentence['text'].split(cname))

print(json.dumps(data, indent=4, sort_keys=True))