Open leon2milan opened 3 years ago
Try replacing the BERT model we used with a multilingual LM such as mBERT or XLM-R. They can be accessed the same way via Hugging Face transformers library.
@todpole3 THX.
In my dataset, some data's headers have duplicated name. I already fix this.
And, there are two place difference.
First, seq
and agg
items are list.
{
"table_id": "a1b2c3d4", # related table id
"question": "世茂茂悦府的套均面积是多少?", # QUESTION
"sql":{ # SQL
"sel": [7, 8], # SQL selected columns
"agg": [0, 1], # aggregate function
"cond_conn_op": 0, # the relation of condition
"conds": [
[1,2,"世茂茂悦府"] # conditional columns, conditional type, conditional values,col_1 == "世茂茂悦府"
]
}
}
Second, the representation of agg and op are different.
op_sql_dict = {0:">", 1:"<", 2:"==", 3:"!="}
agg_sql_dict = {0:"", 1:"AVG", 2:"MAX", 3:"MIN", 4:"COUNT", 5:"SUM"}
conn_sql_dict = {0:"", 1:"and", 2:"or"}
After I run the code, example.matched_values got OrderedDict(). How to deal with this?
This is Chinese NL2SQL dataset. It has same format with wikisql. https://github.com/ZhuiyiTechnology/TableQA Except a little of difference.
"sql": [2]. It uses list to wrap the value. When I load this dataset, I get error. I change the tokenizer to bert_base_chinese. still no working. So, what can I do to finetuen your model in Chinese NL2SQL dataset? Thank You very much!!!