yizhongw / self-instruct

Aligning pretrained language models with instruction data generated by themselves.
Apache License 2.0
4.15k stars 486 forks source link

How to support chinese? #8

Closed AISuperMa closed 1 year ago

hujunchao commented 1 year ago

same question

xianbin7 commented 1 year ago

same question

yizhongw commented 1 year ago

Do you mean generating instruction data in Chinese? I think this is doable by changing the prompts and seed tasks into Chinese. Someone also did translation for the Alpaca instruction data directly link. I don't have the bandwidth/plan to support other languages in this repo though.

FrackinFamous commented 1 year ago

I heard whisper is very good at recognizing and translating as well so you could take in voice prompts and convert to English before the query. Or use a second model or separate call to translate the prompt then translate the completion back. Extra steps but worth it is you can greatly improve performance. AzureAI is really good at voice and translation as well. Makes since since Azure has been providing infrastructure for some time.

On Sun, Mar 26, 2023 at 1:28 PM Yizhong Wang @.***> wrote:

Do you mean generating instruction data in Chinese? I think this is doable by changing the prompts and seed tasks into Chinese. Someone also did translation for the Alpaca instruction data directly link https://github.com/hikariming/alpaca_chinese_dataset. I don't have the bandwidth to support this in this repo though.

— Reply to this email directly, view it on GitHub https://github.com/yizhongw/self-instruct/issues/8#issuecomment-1484165056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ2BA6D2Y5AVNUEZO3IYKTW6B4DBANCNFSM6AAAAAAV6IK5LI . You are receiving this because you are subscribed to this thread.Message ID: @.***>