salesforce / WikiSQL

A large annotated semantic parsing corpus for developing natural language interfaces.
BSD 3-Clause "New" or "Revised" License
1.62k stars 322 forks source link

``query word "symend" is not in input vocabulary'' #2

Closed donglixp closed 7 years ago

donglixp commented 7 years ago

Thanks for sharing the dataset and preprocessing scripts.

I tried to run the following command to generate examples:

python annotate.py 

But the error message indicates that is_valid_example() returns False because of ``symend'':

annotating data/train.jsonl
loading tables
100%|██████████████████████████████████████████████████████████| 18585/18585 [00:01<00:00, 11703.27it/s]
loading examples
  0%|                                                                         | 0/61297 [00:00<?, ?it/s]query word "symend" is not in input vocabulary.
[u'symsyms', u'symselect', u'symwhere', u'symand', u'symcol', u'symtable', u'symcaption', u'sympage', u'symsection', u'symop', u'symcond', u'symquestion', u'symagg', u'symaggops', u'symcondops', u'symaggops', u'max', u'min', u'count', u'sum', u'avg', u'symcondops', u'=', u'>', u'<', u'op', u'symtable', u'symcol', u'state/territory', u'symcol', u'text/background', u'colour', u'symcol', u'format', u'symcol', u'current', u'slogan', u'symcol', u'current', u'series', u'symcol', u'notes', u'symquestion', u'tell', u'me', u'what', u'the', u'notes', u'are', u'for', u'south', u'australia']

Traceback (most recent call last):
  File "annotate.py", line 114, in <module>
    raise Exception(str(a))
Exception: {'seq_output': {'gloss': [u'SYMSELECT', u'SYMAGG', u'SYMCOL', u'Notes', u'SYMWHERE', u'SYMCOL', u'Current', u'slogan', u'SYMOP', u'=', u'SYMCOND', u'SOUTH', u'AUSTRALIA', u'SYMEND'], 'words': [u'symselect', u'symagg', u'symcol', u'notes', u'symwhere', u'symcol', u'current', u'slogan', u'symop', u'=', u'symcond', u'south', u'australia', u'symend'], 'after': [u' ', u'  ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u'']}, 'where_output': {'gloss': [u'SYMWHERE', u'SYMCOL', u'Current', u'slogan', u'SYMOP', u'=', u'SYMCOND', u'SOUTH', u'AUSTRALIA', u'SYMEND'], 'words': [u'symwhere', u'symcol', u'current', u'slogan', u'symop', u'=', u'symcond', u'south', u'australia', u'symend'], 'after': [u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u'']}, 'question': {'gloss': [u'Tell', u'me', u'what', u'the', u'notes', u'are', u'for', u'South', u'Australia'], 'words': [u'tell', u'me', u'what', u'the', u'notes', u'are', u'for', u'south', u'australia'], 'after': [u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u'']}, 'table_id': u'1-1000181-1', 'table': {'header': [{'gloss': [u'State/territory'], 'words': [u'state/territory'], 'after': [u'']}, {'gloss': [u'Text/background', u'colour'], 'words': [u'text/background', u'colour'], 'after': [u' ', u'']}, {'gloss': [u'Format'], 'words': [u'format'], 'after': [u'']}, {'gloss': [u'Current', u'slogan'], 'words': [u'current', u'slogan'], 'after': [u' ', u'']}, {'gloss': [u'Current', u'series'], 'words': [u'current', u'series'], 'after': [u' ', u'']}, {'gloss': [u'Notes'], 'words': [u'notes'], 'after': [u'']}]}, 'query': {u'agg': 0, u'sel': 5, u'conds': [[3, 0, {'gloss': [u'SOUTH', u'AUSTRALIA'], 'words': [u'south', u'australia'], 'after': [u' ', u'']}]]}, 'seq_input': {'gloss': [u'SYMSYMS', u'SYMSELECT', u'SYMWHERE', u'SYMAND', u'SYMCOL', u'SYMTABLE', u'SYMCAPTION', u'SYMPAGE', u'SYMSECTION', u'SYMOP', u'SYMCOND', u'SYMQUESTION', u'SYMAGG', u'SYMAGGOPS', u'SYMCONDOPS', u'SYMAGGOPS', u'MAX', u'MIN', u'COUNT', u'SUM', u'AVG', u'SYMCONDOPS', u'=', u'>', u'<', u'OP', u'SYMTABLE', u'SYMCOL', u'State/territory', u'SYMCOL', u'Text/background', u'colour', u'SYMCOL', u'Format', u'SYMCOL', u'Current', u'slogan', u'SYMCOL', u'Current', u'series', u'SYMCOL', u'Notes', u'SYMQUESTION', u'Tell', u'me', u'what', u'the', u'notes', u'are', u'for', u'South', u'Australia'], 'words': [u'symsyms', u'symselect', u'symwhere', u'symand', u'symcol', u'symtable', u'symcaption', u'sympage', u'symsection', u'symop', u'symcond', u'symquestion', u'symagg', u'symaggops', u'symcondops', u'symaggops', u'max', u'min', u'count', u'sum', u'avg', u'symcondops', u'=', u'>', u'<', u'op', u'symtable', u'symcol', u'state/territory', u'symcol', u'text/background', u'colour', u'symcol', u'format', u'symcol', u'current', u'slogan', u'symcol', u'current', u'series', u'symcol', u'notes', u'symquestion', u'tell', u'me', u'what', u'the', u'notes', u'are', u'for', u'south', u'australia'], 'after': [u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u'  ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u' ', u'']}}
vzhong commented 7 years ago

sorry about that! Should be fixed by b4f34057135e6e08679b57a8b9c97af787a5c7d5.

donglixp commented 7 years ago

Thanks for the fix!

vzhong commented 7 years ago

No problem! Really enjoyed your work on neural semantic parsing!