Open briandw opened 6 months ago
Added a scaling_factor to the rotary embedding calculation. This is for use with models like DeepSeek. DeepSeek uses LlamaLinearScalingRotaryEmbedding. The only difference is that the freqs in precompute_freqs_cis are divided by a scaling factor.
Added a scaling_factor to the rotary embedding calculation. This is for use with models like DeepSeek. DeepSeek uses LlamaLinearScalingRotaryEmbedding. The only difference is that the freqs in precompute_freqs_cis are divided by a scaling factor.