taoyds / spider

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
https://yale-lily.github.io/spider
Apache License 2.0
801 stars 193 forks source link

Tokenize new data #81

Open eliotwalt opened 2 years ago

eliotwalt commented 2 years ago

Hi, I am trying to use the model on new data and struggle to reproduce the tokenization method to obtain the query_toks and query_toks_no_value fields. I tried using process_sql.tokenize which does not produce the same results as the dataset.
Is the code for this provided somewhere? Thanks.

tonyzhao6 commented 1 year ago

So i ran into the same issue. I believe if you just don't lowercase the word within process_sql.tokenize, that should do the trick.

tonyzhao6 commented 1 year ago

however, i have not figured out how to get the query_toks_no_value field

listentomi commented 8 months ago

however, i have not figured out how to get the query_toks_no_value field

So do I. I want to know if you solve this problem. Can you tell me how to generate the query_toks_no_value field?