Closed 233function closed 2 weeks ago
Hi, we use dynamic NTK in our training and you only need to adjust the factor
of rope_scaling
in config.json when inference for longer context window. The effective context window is determined by the multiplication result max_position_embeddings*factor.
The setting, 24000, is only for length of single training sample. In this case, we use the sample with 24000 tokens (30% of 80k) for training.
你好!在外推qwen2至128k时,seq_len是否需要设置为32768?我发现代码中llama3设置为24000