How to support chinese?

hujunchao commented 1 year ago

same question

xianbin7 commented 1 year ago

same question

yizhongw commented 1 year ago

Do you mean generating instruction data in Chinese? I think this is doable by changing the prompts and seed tasks into Chinese. Someone also did translation for the Alpaca instruction data directly link. I don't have the bandwidth/plan to support other languages in this repo though.

FrackinFamous commented 1 year ago

I heard whisper is very good at recognizing and translating as well so you could take in voice prompts and convert to English before the query. Or use a second model or separate call to translate the prompt then translate the completion back. Extra steps but worth it is you can greatly improve performance. AzureAI is really good at voice and translation as well. Makes since since Azure has been providing infrastructure for some time.

On Sun, Mar 26, 2023 at 1:28 PM Yizhong Wang @.***> wrote:

Do you mean generating instruction data in Chinese? I think this is doable by changing the prompts and seed tasks into Chinese. Someone also did translation for the Alpaca instruction data directly link https://github.com/hikariming/alpaca_chinese_dataset. I don't have the bandwidth to support this in this repo though.

— Reply to this email directly, view it on GitHub https://github.com/yizhongw/self-instruct/issues/8#issuecomment-1484165056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ2BA6D2Y5AVNUEZO3IYKTW6B4DBANCNFSM6AAAAAAV6IK5LI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yizhongw / self-instruct

How to support chinese? #8