salesforce / WikiSQL

A large annotated semantic parsing corpus for developing natural language interfaces.
BSD 3-Clause "New" or "Revised" License
1.6k stars 320 forks source link

add Text2SQLGen work #80

Closed youssefmellah closed 3 years ago

salesforce-cla[bot] commented 3 years ago

Thanks for the contribution! Before we can merge this, we need @youssefmellah to sign the Salesforce.com Contributor License Agreement.

youssefmellah commented 3 years ago

Hi,

Can you please merge our results to the leaderboard?

We published our paper in many journals and international conferences, and it still under review. We provide a link oneDrive of the paper.

Thank you.

youssefmellah commented 3 years ago

@vzhong can you merge our results, please?

vzhong commented 3 years ago

Hi, I skimmed your manuscript and this doesn't seem like weak supervision, even though your pull request puts in under weak supervision? Can you confirm? How are you fine tuning t5?

youssefmellah commented 3 years ago

Hi, I skimmed your manuscript and this doesn't seem like weak supervision, even though your pull request puts in under weak supervision? Can you confirm? How are you fine tuning t5?

Hi @vzhong

No, it is weakly supervised as the previous work (SeqGenSQL+EG (Li 2020)) on which we are based.

  1. Firstly, we used the original t5 and mt5, then we modify it using the gated extraction network (just like SeqGenSQL).
  2. we combined t5 with mt5 by using them together (using the pre-trained models on WikiSQL of each one, got from 1.), and that based only on the questions (by checking if it contains only English words and non-special characters). Finally, we reinforce the aggregation predictions by some association Rules.

We do not use logical forms during training.

Thank you @vzhong

vzhong commented 3 years ago

Sorry I still don’t understand how you do fine tuning (which is mentioned in your paper) without logical forms. On Mar 8, 2021, 9:44 AM -0800, youssefmellah notifications@github.com, wrote:

Hi, I skimmed your manuscript and this doesn't seem like weak supervision, even though your pull request puts in under weak supervision? Can you confirm? How are you fine tuning t5? Hi @vzhong No, it is weakly supervised as the previous work (SeqGenSQL+EG (Li 2020)) on which we are based.

  1. Firstly, we used the original t5 and mt5, then we modify it using the gated extraction network (just like SeqGenSQL).
  2. we combined t5 with mt5 by using them together (using the pre-trained models on WikiSQL of each one, got from 1.), and that based only on the questions (by checking if it contains only English words and non-special characters). Finally, we reinforce the aggregation predictions by some association Rules.

We do not use logical forms during training. Thank you @vzhong — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

vzhong commented 3 years ago

Did these models “pretrained” on wikisql also not use logical forms? On Mar 8, 2021, 9:44 AM -0800, youssefmellah notifications@github.com, wrote:

Hi, I skimmed your manuscript and this doesn't seem like weak supervision, even though your pull request puts in under weak supervision? Can you confirm? How are you fine tuning t5? Hi @vzhong No, it is weakly supervised as the previous work (SeqGenSQL+EG (Li 2020)) on which we are based.

  1. Firstly, we used the original t5 and mt5, then we modify it using the gated extraction network (just like SeqGenSQL).
  2. we combined t5 with mt5 by using them together (using the pre-trained models on WikiSQL of each one, got from 1.), and that based only on the questions (by checking if it contains only English words and non-special characters). Finally, we reinforce the aggregation predictions by some association Rules.

We do not use logical forms during training. Thank you @vzhong — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

youssefmellah commented 3 years ago

@vzhong We used t5 and mt5 as sequence generation models, to directly convert questions (augmented with schema) to SQL queries. (we generate the SQL query in one step, without logical forms).

vzhong commented 3 years ago

OK I read the SeqGenSQL+EG paper. It looks like the finetuning part IS using logical forms (it finetunes using question and logical forms). Therefore these methods are both not weakly supervised because they both use sequence generation models that are fine-tuned using logical forms. Does this sound like an accurate description of what you are doing?

vzhong commented 3 years ago

Please see https://github.com/salesforce/WikiSQL/pull/75 for the discussion regarding SeqGenSQL.

youssefmellah commented 3 years ago

Please see #75 for the discussion regarding SeqGenSQL.

Ah Ok, now I understand what do you mean exactly by logical form. In this case, our work is supervised too.

Can you @vzhong please integrate our results in the leaderboard "Supervised via logical forms"?

Thanks

vzhong commented 3 years ago

done via bc0a6a9cf7f6b0159f13d84bb46d71765996c3b7