hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.58k stars 4.32k forks source link

[FEATURE]: 基于codellama的continual training #4906

Open bohea opened 11 months ago

bohea commented 11 months ago

Describe the feature

看到Colossal-LLaMA-2-7B 实现了基于llama2的continual training

不知道是否可以基于codellama做continual training,我觉得与llama2主要的区别有:

  1. infilling training, codellama做了FIM的训练, 所以训练数据的预处理方式可能不同
  2. long context fine tuning, codellama额外做了lcft, 是否会影响continual training
Issues-translate-bot commented 11 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [FEATURE]: continual training based on codellama

Orion-Zheng commented 10 months ago

是的,如果要对code llama进行continual pre-training需要改一下数据的处理,适应FIM的目标。对于长度外推,Code LLama采取的方法是将RoPE位置编码中的base设置成1000000,然后在16k的长度训练(但是实际效果可以外推到128k),其实并不需要对代码做啥改动。但是更长的序列会显著增加显存的占用,所以对硬件的要求会高不少。另外也需要准备一批比较长的代码数据。我们目前还没有做CodeLLama Continual Pre-training的计划😃如果您愿意非常欢迎进行贡献!

image
Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Yes, if you want to perform continual pre-training on code llama, you need to change the data processing to adapt to the goals of FIM. For length extrapolation, the method adopted by Code LLama is to set the base in the RoPE position encoding to 1000000, and then train at a length of 16k (but the actual effect can be extrapolated to 128k). In fact, no changes to the code are required. However, longer sequences will significantly increase the memory usage, so the hardware requirements will be much higher. In addition, a batch of relatively long code data also needs to be prepared. We currently have no plans to do CodeLLama Continual Pre-training 😃 If you are willing, you are very welcome to contribute!

image