openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.27k stars 370 forks source link

Tranlsated to another language. #55

Closed dsdanielpark closed 1 year ago

dsdanielpark commented 1 year ago

I can retrain open_llama, which is trained under the Apache License, in Korean. Could you please guide me on how to obtain the dataset and provide detailed instructions for the training process? I would like to translate the dataset into Korean and retrain the model accordingly. Thank you for your amazing project once again, and I appreciate any guidance you can provide.

young-geng commented 1 year ago

The dataset is download directly from the RedPajama website. You can see the training script example here, which is the script we used to train OpenLLaMA on TPU v4-512 pod.

xingenju commented 12 months ago

I can retrain open_llama, which is trained under the Apache License, in Korean. Could you please guide me on how to obtain the dataset and provide detailed instructions for the training process? I would like to translate the dataset into Korean and retrain the model accordingly. Thank you for your amazing project once again, and I appreciate any guidance you can provide.

Have you started to re-train the Korean version?