Closed kalleknast closed 3 years ago
In the method format_batch
, gt_tables
are not defined for Spider, so some essential tables are dropped. In another word, the schema_graph
is not complete. That is why you got a None
for schema_pos
.
I am also trying to solve this problem.
Hi, I also encountered this problem. Is there any solution now? Thx.
Same to me.
On line 360 in src/semantic_parser/learn_framework.py, exp.gt_table_names_list
is None, so the ground truth tables gt_tables
is an empty list.
When dropping tables, ground truth tables will be dropped, which leads the schema_pos
is None.
So change line 362~363
else:
gt_tables = []
to
else:
gt_table_names = [token for token, t in
zip(exp.program_singleton_field_tokens, exp.program_singleton_field_token_types) if t == 0]
gt_tables = set([schema_graph.get_table_id(t_name) for t_name in gt_table_names])
On line 360 in src/semantic_parser/learn_framework.py,
exp.gt_table_names_list
is None, so the ground truth tablesgt_tables
is an empty list. When dropping tables, ground truth tables will be dropped, which leads theschema_pos
is None.So change line 362~363
else: gt_tables = []
to
else: gt_table_names = [token for token, t in zip(exp.program_singleton_field_tokens, exp.program_singleton_field_token_types) if t == 0] gt_tables = set([schema_graph.get_table_id(t_name) for t_name in gt_table_names])
This fix seems to work. I could train for 6003 iterations before running into a probably unrelated torch/cudnn issue.
Will you (whuFSN) make a PR?
I updated the code and the issue is gone.
Many thanks to @whuFSN, great catch and fix.
Training on the spider data set fails with
TypeError: '<' not supported between instances of 'NoneType' and 'int'
.The only modification made is using a smaller batch size, 8 instead of 16, to avoid memory issues. I have not tried to debug
vectorizers.py
.Training on WikiSQL worked fine.
The full error: