microsoft / ContextualSP

Multiple paper open-source codes of the Microsoft Research Asia DKI group
MIT License
374 stars 62 forks source link

Error while training using turn.none.jsonnet #1

Closed ndmanifold closed 4 years ago

ndmanifold commented 4 years ago

Hi,

I am trying to run the code for turn.none.jsonnet, I am getting the following error

Traceback (most recent call last): File "./dataset_reader/sparc_reader.py", line 143, in build_instance sql_query_list=sql_query_list File "./dataset_reader/sparc_reader.py", line 417, in text_to_instance action_non_terminal, action_seq, all_valid_actions = world.get_action_sequence_and_all_actions() File "./context/world.py", line 155, in get_action_sequence_and_all_actions action_sequence = self.sql_converter.translate_to_intermediate(self.sql_clause) File "./context/converter.py", line 87, in translate_to_intermediate return self._process_statement(sql_clause=sql_clause) File "./context/converter.py", line 117, in _process_statement inter_seq.extend(self._process_root(sql_clause)) File "./context/converter.py", line 657, in _process_root step_inter_seq = _process_step(cur_state) File "./context/converter.py", line 511, in _process_step return call_back_mapping[step_state](sql_clause) File "./context/converter.py", line 262, in _process_join if self.col_names[col_ind].refer_table.name == join_tab_name: KeyError: 'C'

After this the code fails with: site-packages/allennlp/data/vocabulary.py", line 399, in from_instances instance.count_vocab_items(namespace_token_counts) AttributeError: 'NoneType' object has no attribute 'count_vocab_items

I have downloaded the glove embeddings in glove folder and the dataset is in dataset_sparc along with the code. Do you have any suggestions what might be the issue?

Thanks

SivilTaram commented 4 years ago

@ndmanifold It seems that the parameter col_ind is set as a string C rather than an integer. Could you check whether the database folder and tables.json are successfully read into the memory? Or you can stop on line 262 of context/converter.py and see why col_ind is set as a string C.

ndmanifold commented 4 years ago

Hi, Thank you for looking into this.

Here is what I get. I have printed a few things. Looks like an indexing issue.

sql_clause:  {'except': None, 'from': {'conds': [[False, 2, [0, [0, 7, False], None], [0, 5, False], None], 'and', [False, 2, [0, [0, 6, False], None], '"Chris Jackson"', None]], 'table_units': [['table_unit', 2], ['table_unit', 1]]}, 'groupBy': [], 'having': [], 'intersect': None, 'limit': None, 'orderBy': [], 'select': [False, [[0, [0, [0, 0, False], None]]]], 'union': None, 'where': [], 'join': 'Reviewer'}
from_cond_col_inds:  [6, 'C']
condition:  [False, 2, [0, [0, 6, False], None], '"Chris Jackson"', None]
col_ind:  6
col_ind:  C

```Error in db_id: movie_1, utterance: ['Give me the ratings given by Chris Jackson.', 'What movies are they about?', 'Show me the movies other than those.']
Traceback (most recent call last):
  File "./dataset_reader/sparc_reader.py", line 150, in build_instance
    sql_query_list=sql_query_list
  File "./dataset_reader/sparc_reader.py", line 464, in text_to_instance
    action_non_terminal, action_seq, all_valid_actions = world.get_action_sequence_and_all_actions()
  File "./context/world.py", line 155, in get_action_sequence_and_all_actions
    action_sequence = self.sql_converter.translate_to_intermediate(self.sql_clause)
  File "./context/converter.py", line 87, in translate_to_intermediate
    return self._process_statement(sql_clause=sql_clause)
  File "./context/converter.py", line 117, in _process_statement
    inter_seq.extend(self._process_root(sql_clause))
  File "./context/converter.py", line 663, in _process_root
    step_inter_seq = _process_step(cur_state)
  File "./context/converter.py", line 517, in _process_step
    return call_back_mapping[step_state](sql_clause)
  File "./context/converter.py", line 266, in _process_join
    if self.col_names[col_ind].refer_table.name == join_tab_name:
KeyError: 'C'

I am still tracing the code but later on it fails with:

site-packages/allennlp/data/vocabulary.py", line 399, in from_instances
    instance.count_vocab_items(namespace_token_counts)
AttributeError: 'NoneType' object has no attribute 'count_vocab_items'

Also, looks like some of the instances have nothing in select, because of which another error occurs at:

In db_id: driving_school, utterance: ['List records of customers living in city Lockmanfurt.', 'Just show the first and last name.']
  File "./context/grammar.py", line 432, in __init__
    self.production = self.grammar_dict[id_c]
KeyError: -1
SivilTaram commented 4 years ago

@ndmanifold Thanks for you detailed feedback. I will try to parse the SQL clause and return here.

SivilTaram commented 4 years ago

@ndmanifold Many thanks for your feedback! It is indeed an issue in the codebase. I fixed it the latest commit, and provide a test script for your debugging and validating when translating SQL into SemQL:

https://github.com/microsoft/ContextualSP/blob/2b59163b3cca9922098c19895943b2c9e57c3447/semantic_parsing_in_context/test_sql_to_semql.py#L23-L65

You could write your own test cases to achieve the same thing (sql_clause is from the dataset, and expected_action_str could be set as empty at first until you obtain the correct expected action sequence output)

BTW, the trackback you report may not have an effect on training models since we catch all exceptions in the code (So you could treat it as an warning, these unhandled cases will not be used in training).

ndmanifold commented 4 years ago

Thanks for the update.