salesforce / WikiSQL

A large annotated semantic parsing corpus for developing natural language interfaces.
BSD 3-Clause "New" or "Revised" License
1.6k stars 320 forks source link

Added SeqGenSQL #75

Closed louis-li closed 3 years ago

salesforce-cla[bot] commented 3 years ago

Thanks for the contribution! Before we can merge this, we need @louis-li to sign the Salesforce.com Contributor License Agreement.

louis-li commented 3 years ago

Added SeqGenSQL for weakly supervised training.

vzhong commented 3 years ago

Hi @louis-li , it sounds like you finetune sequence models on the supervised data. Is this correct?

vzhong commented 3 years ago

In particular I am talking about the quote

To establish a baseline, we used T5-small as a base model and followed a commonly adopted practice of combining the natural language question and table columns as input with SQL statements as output.

vzhong commented 3 years ago

CC @todpole3

louis-li commented 3 years ago

It's not using logic form data if that's what you're asking. The table columns are the name of columns from table definition.

vzhong commented 3 years ago

Sorry I don’t quite understand what is happening here. It sounds like you’re fine tuning it’s to produce SQL output given the table schema and the question. How do you do this fine tuning without using the SQL as supervision? On Mar 8, 2021, 3:24 PM -0800, Louis Li notifications@github.com, wrote:

It's not using logic form data if that's what you're asking. The table columns are the name of columns from table definition. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

louis-li commented 3 years ago

OK I see.

To clarify - this model is weakly supervised - it doesn't use Logic Form data (column index) but it's trained on SQL statement as output.

The input includes table definition like columns names and data types, the output is SQL statement like Select name from table1 where id = 3

vzhong commented 3 years ago

Ahh in this case we define logical forms to be the sql statements themselves, which are equivalent to the dictionary representations given the table schema. Unfortunately, since you are using SQL question pairs during training, I will have to move your submission to the non weakly supervised table. On Mar 8, 2021, 4:27 PM -0800, Louis Li notifications@github.com, wrote:

OK I see.

To clarify - this model is weakly supervised - it doesn't use Logic Form data (column index) but it's trained on SQL statement as output.

The input includes table definition like columns names and data types, the output is SQL statement like Select name from table1 where id = 3


From: Victor Zhong notifications@github.com Sent: March 8, 2021 7:21 PM To: salesforce/WikiSQL WikiSQL@noreply.github.com Cc: Louis Li li_ning123@hotmail.com; Mention mention@noreply.github.com Subject: Re: [salesforce/WikiSQL] Added SeqGenSQL (#75)

Sorry I don’t quite understand what is happening here. It sounds like you’re fine tuning it’s to produce SQL output given the table schema and the question. How do you do this fine tuning without using the SQL as supervision? On Mar 8, 2021, 3:24 PM -0800, Louis Li notifications@github.com, wrote:

It's not using logic form data if that's what you're asking. The table columns are the name of columns from table definition. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsalesforce%2FWikiSQL%2Fpull%2F75%23issuecomment-793195352&data=04%7C01%7C%7C9c58804d9f7143367ae708d8e2913af8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508460665193912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2OCRumHgjLyeDNWJYak3wR%2FRF30MTd8zQ%2BFVlJz6gkU%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADQFV2AOTM4YZKKLNGTDN63TCVSXDANCNFSM4TQCK7WA&data=04%7C01%7C%7C9c58804d9f7143367ae708d8e2913af8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508460665193912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x4T6v%2FIyL9WfA04Bny3xhNOSBPNhVbo3eq8ptZYguCs%3D&reserved=0. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

louis-li commented 3 years ago

I'm not sure if I understand the determining factor is - what do other weakly supervised models generate? If they are generate SQL statements, what make them weakly supervised?


From: Victor Zhong notifications@github.com Sent: March 8, 2021 7:29 PM To: salesforce/WikiSQL WikiSQL@noreply.github.com Cc: Louis Li li_ning123@hotmail.com; Mention mention@noreply.github.com Subject: Re: [salesforce/WikiSQL] Added SeqGenSQL (#75)

Ahh in this case we define logical forms to be the sql statements themselves, which are equivalent to the dictionary representations given the table schema. Unfortunately, since you are using SQL question pairs during training, I will have to move your submission to the non weakly supervised table. On Mar 8, 2021, 4:27 PM -0800, Louis Li notifications@github.com, wrote:

OK I see.

To clarify - this model is weakly supervised - it doesn't use Logic Form data (column index) but it's trained on SQL statement as output.

The input includes table definition like columns names and data types, the output is SQL statement like Select name from table1 where id = 3


From: Victor Zhong notifications@github.com Sent: March 8, 2021 7:21 PM To: salesforce/WikiSQL WikiSQL@noreply.github.com Cc: Louis Li li_ning123@hotmail.com; Mention mention@noreply.github.com Subject: Re: [salesforce/WikiSQL] Added SeqGenSQL (#75)

Sorry I don’t quite understand what is happening here. It sounds like you’re fine tuning it’s to produce SQL output given the table schema and the question. How do you do this fine tuning without using the SQL as supervision? On Mar 8, 2021, 3:24 PM -0800, Louis Li notifications@github.com, wrote:

It's not using logic form data if that's what you're asking. The table columns are the name of columns from table definition. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsalesforce%2FWikiSQL%2Fpull%2F75%23issuecomment-793195352&data=04%7C01%7C%7C9c58804d9f7143367ae708d8e2913af8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508460665193912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2OCRumHgjLyeDNWJYak3wR%2FRF30MTd8zQ%2BFVlJz6gkU%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADQFV2AOTM4YZKKLNGTDN63TCVSXDANCNFSM4TQCK7WA&data=04%7C01%7C%7C9c58804d9f7143367ae708d8e2913af8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508460665193912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x4T6v%2FIyL9WfA04Bny3xhNOSBPNhVbo3eq8ptZYguCs%3D&reserved=0. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsalesforce%2FWikiSQL%2Fpull%2F75%23issuecomment-793199459&data=04%7C01%7C%7Cef499b3a19864072854d08d8e292783c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508465990814829%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=62InJ9tXJeuCu9eUwW%2FOoORou0CdrPi%2B6PbLlg1P8Os%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADQFV2BYYUPZJ7FZD6EQHLLTCVTYLANCNFSM4TQCK7WA&data=04%7C01%7C%7Cef499b3a19864072854d08d8e292783c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508465990824822%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vo9AswsVCmjRduiaiFKHgemrgjYx898NxcQval8cwgY%3D&reserved=0.

vzhong commented 3 years ago

It’s not a function of what you generate but what you train on. It sounds like you are fine tuning (eg training) models using the question SQL pairs. Is this correct? On Mar 8, 2021, 4:44 PM -0800, Louis Li notifications@github.com, wrote:

I'm not sure if I understand the determining factor is - what do other weakly supervised models generate? If they are generate SQL statements, what make them weakly supervised?


From: Victor Zhong notifications@github.com Sent: March 8, 2021 7:29 PM To: salesforce/WikiSQL WikiSQL@noreply.github.com Cc: Louis Li li_ning123@hotmail.com; Mention mention@noreply.github.com Subject: Re: [salesforce/WikiSQL] Added SeqGenSQL (#75)

Ahh in this case we define logical forms to be the sql statements themselves, which are equivalent to the dictionary representations given the table schema. Unfortunately, since you are using SQL question pairs during training, I will have to move your submission to the non weakly supervised table. On Mar 8, 2021, 4:27 PM -0800, Louis Li notifications@github.com, wrote:

OK I see.

To clarify - this model is weakly supervised - it doesn't use Logic Form data (column index) but it's trained on SQL statement as output.

The input includes table definition like columns names and data types, the output is SQL statement like Select name from table1 where id = 3


From: Victor Zhong notifications@github.com Sent: March 8, 2021 7:21 PM To: salesforce/WikiSQL WikiSQL@noreply.github.com Cc: Louis Li li_ning123@hotmail.com; Mention mention@noreply.github.com Subject: Re: [salesforce/WikiSQL] Added SeqGenSQL (#75)

Sorry I don’t quite understand what is happening here. It sounds like you’re fine tuning it’s to produce SQL output given the table schema and the question. How do you do this fine tuning without using the SQL as supervision? On Mar 8, 2021, 3:24 PM -0800, Louis Li notifications@github.com, wrote:

It's not using logic form data if that's what you're asking. The table columns are the name of columns from table definition. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsalesforce%2FWikiSQL%2Fpull%2F75%23issuecomment-793195352&data=04%7C01%7C%7C9c58804d9f7143367ae708d8e2913af8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508460665193912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2OCRumHgjLyeDNWJYak3wR%2FRF30MTd8zQ%2BFVlJz6gkU%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADQFV2AOTM4YZKKLNGTDN63TCVSXDANCNFSM4TQCK7WA&data=04%7C01%7C%7C9c58804d9f7143367ae708d8e2913af8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508460665193912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x4T6v%2FIyL9WfA04Bny3xhNOSBPNhVbo3eq8ptZYguCs%3D&reserved=0. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsalesforce%2FWikiSQL%2Fpull%2F75%23issuecomment-793199459&data=04%7C01%7C%7Cef499b3a19864072854d08d8e292783c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508465990814829%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=62InJ9tXJeuCu9eUwW%2FOoORou0CdrPi%2B6PbLlg1P8Os%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADQFV2BYYUPZJ7FZD6EQHLLTCVTYLANCNFSM4TQCK7WA&data=04%7C01%7C%7Cef499b3a19864072854d08d8e292783c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508465990824822%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vo9AswsVCmjRduiaiFKHgemrgjYx898NxcQval8cwgY%3D&reserved=0. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

louis-li commented 3 years ago

I see. Didn't realize SQL statement is also considered as logical form.


From: Victor Zhong notifications@github.com Sent: March 8, 2021 8:00 PM To: salesforce/WikiSQL WikiSQL@noreply.github.com Cc: Louis Li li_ning123@hotmail.com; Mention mention@noreply.github.com Subject: Re: [salesforce/WikiSQL] Added SeqGenSQL (#75)

It’s not a function of what you generate but what you train on. It sounds like you are fine tuning (eg training) models using the question SQL pairs. Is this correct? On Mar 8, 2021, 4:44 PM -0800, Louis Li notifications@github.com, wrote:

I'm not sure if I understand the determining factor is - what do other weakly supervised models generate? If they are generate SQL statements, what make them weakly supervised?


From: Victor Zhong notifications@github.com Sent: March 8, 2021 7:29 PM To: salesforce/WikiSQL WikiSQL@noreply.github.com Cc: Louis Li li_ning123@hotmail.com; Mention mention@noreply.github.com Subject: Re: [salesforce/WikiSQL] Added SeqGenSQL (#75)

Ahh in this case we define logical forms to be the sql statements themselves, which are equivalent to the dictionary representations given the table schema. Unfortunately, since you are using SQL question pairs during training, I will have to move your submission to the non weakly supervised table. On Mar 8, 2021, 4:27 PM -0800, Louis Li notifications@github.com, wrote:

OK I see.

To clarify - this model is weakly supervised - it doesn't use Logic Form data (column index) but it's trained on SQL statement as output.

The input includes table definition like columns names and data types, the output is SQL statement like Select name from table1 where id = 3


From: Victor Zhong notifications@github.com Sent: March 8, 2021 7:21 PM To: salesforce/WikiSQL WikiSQL@noreply.github.com Cc: Louis Li li_ning123@hotmail.com; Mention mention@noreply.github.com Subject: Re: [salesforce/WikiSQL] Added SeqGenSQL (#75)

Sorry I don’t quite understand what is happening here. It sounds like you’re fine tuning it’s to produce SQL output given the table schema and the question. How do you do this fine tuning without using the SQL as supervision? On Mar 8, 2021, 3:24 PM -0800, Louis Li notifications@github.com, wrote:

It's not using logic form data if that's what you're asking. The table columns are the name of columns from table definition. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsalesforce%2FWikiSQL%2Fpull%2F75%23issuecomment-793195352&data=04%7C01%7C%7C9c58804d9f7143367ae708d8e2913af8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508460665193912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2OCRumHgjLyeDNWJYak3wR%2FRF30MTd8zQ%2BFVlJz6gkU%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADQFV2AOTM4YZKKLNGTDN63TCVSXDANCNFSM4TQCK7WA&data=04%7C01%7C%7C9c58804d9f7143367ae708d8e2913af8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508460665193912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x4T6v%2FIyL9WfA04Bny3xhNOSBPNhVbo3eq8ptZYguCs%3D&reserved=0. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsalesforce%2FWikiSQL%2Fpull%2F75%23issuecomment-793199459&data=04%7C01%7C%7Cef499b3a19864072854d08d8e292783c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508465990814829%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=62InJ9tXJeuCu9eUwW%2FOoORou0CdrPi%2B6PbLlg1P8Os%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADQFV2BYYUPZJ7FZD6EQHLLTCVTYLANCNFSM4TQCK7WA&data=04%7C01%7C%7Cef499b3a19864072854d08d8e292783c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508465990824822%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vo9AswsVCmjRduiaiFKHgemrgjYx898NxcQval8cwgY%3D&reserved=0. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsalesforce%2FWikiSQL%2Fpull%2F75%23issuecomment-793222727&data=04%7C01%7C%7Cd2e597ff56df4eb1abb108d8e296c960%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508484544682946%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9yM%2BDsf3HiO8DJx0RBbI3GLxX7Eo0A%2FaXZPZK9E8YSs%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADQFV2FZUR6STNEMOUXZXBDTCVXMHANCNFSM4TQCK7WA&data=04%7C01%7C%7Cd2e597ff56df4eb1abb108d8e296c960%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508484544682946%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oJZkNdjqta%2Bpy%2Fi0DXT9JtSTu8rjhAIdaEntr4V1jcU%3D&reserved=0.