ytyz1307zzh / PLUG

Code for the ACL 2024 paper "PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning"
10 stars 1 forks source link

:zap: PLUG: Pivot Language Guided Generation

This is the repository for PLUG (Pivot Language gUided Generation), a simple yet effective method for the cross-lingual instruction tuning of large language models (LLMs). PLUG utilizes a high-resource language as the pivot to enhance instruction tuning in low-resource languages. It trains the model to first process the instruction and draft a response in the pivot language, before producing the final response in the target language. PLUG is proved to significantly improve the instruction-following abilities of LLMs in multiple target languages (Chinese, Korean, Italian, Spanish), compared to directly responding in the target language alone. For more details, please refer to our paper ":zap:PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning".

Auto-Instruct Illustration

Environment

pip install torch==2.0.1
pip install transformers==4.31.0 deepspeed==0.9.5 accelerate==0.21.0
pip install openai tiktoken tqdm peft huggingface_hub datasets 
# Only for evaluating X-AlpacaEval
pip install shortuuid anthropic

Code

Code can be found in src directory, which contains the following sub-directories:

Please refer to the corresponding directory for detailed information.

Data

We provide the following data used in our experiments

Citation

If you find our data or code useful, please kindly cite our paper:

@article{zhang2023plug,
  title={PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning},
  author={Zhang, Zhihan and Lee, Dong-Ho and Fang, Yuwei and Yu, Wenhao and Jia, Mengzhao and Jiang, Meng and Barbieri, Francesco},
  journal={arXiv preprint arXiv:2311.08711},
  year={2023}
}