microsoft / rat-sql

A relation-aware semantic parsing model from English to SQL
https://arxiv.org/abs/1911.04942
MIT License
406 stars 117 forks source link

Does it need question id convert after computing cv_link in BERT mode? #4

Closed AoZhang closed 4 years ago

AoZhang commented 4 years ago

It's a nice idea to encode question&schema using RAT, and thanks for your sharing of the well designed codes.

I have some doubt about cell-value linking in BERT mode when reading your code.

In ratsql/models/spider/spider_enc.py SpiderEncoderBertPreproc.preprocess_item (from line 670 to 683), I realized that you compute sc link, and then cv link.

In the procedure of sc linking you call the function Bertokenize.bert_schema_linking, which uses normalized_piece to compute schema linking, and converts question (token) id back to pieces id using idx_map.

But in the procedure of cv linking you only linking question_bert_tokens.normalized_pieces with item.schema without converting id back.

Is that a bug? I'm hoping for you response.

berlino commented 4 years ago

Thanks for reporting this!

I think it's indeed a bug. For cv_linking, the token ids should also be convert it back to pieces ids. I'll submit a PR to fix this.