zhiyuanhubj / LongRecipe

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
https://arxiv.org/abs/2409.00509
66 stars 4 forks source link

seq_len设置 #4

Closed 233function closed 2 weeks ago

233function commented 1 month ago

你好!在外推qwen2至128k时,seq_len是否需要设置为32768?我发现代码中llama3设置为24000

zhiyuanhubj commented 1 month ago

Hi, we use dynamic NTK in our training and you only need to adjust the factor of rope_scaling in config.json when inference for longer context window. The effective context window is determined by the multiplication result max_position_embeddings*factor.

The setting, 24000, is only for length of single training sample. In this case, we use the sample with 24000 tokens (30% of 80k) for training.