tshu-w / DBCopilot

Code and data for the paper "DBCᴏᴘɪʟᴏᴛ: Scaling Natural Language Querying to Massive Databases"
41 stars 6 forks source link

How to support chinese? #5

Closed GeorgeXiaojie closed 3 months ago

GeorgeXiaojie commented 3 months ago

Thanks for sharing and also read your paper, great work. May I ask, if I want to support Chinese, how to realize it? The database table comments are in Chinese, and the user's questions are also in Chinese.

Looking forward to your reply

tshu-w commented 3 months ago

Thanks for your attention.

I believe this problem can be addressed by using table comments and schema information to generate Chinese questions during the data synthesis phase and replacing English PLMs with Chinese PLMs.

More specifically, you can first use LLM to generate some high-quality Chinese queries, and then train a question generation model to further boost the pseudo data.

GeorgeXiaojie commented 3 months ago

That's a great idea, and I will try it as you suggested. Thank you very much.

tshu-w commented 3 months ago

If you have more questions, feel free to continue contacting us.