Support ScalingRotaryEmbedding

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD 3-Clause "New" or "Revised" License

5.36k stars 485 forks source link

Support ScalingRotaryEmbedding #44

Open briandw opened 6 months ago

briandw commented 6 months ago

Added a scaling_factor to the rotary embedding calculation. This is for use with models like DeepSeek. DeepSeek uses LlamaLinearScalingRotaryEmbedding. The only difference is that the freqs in precompute_freqs_cis are divided by a scaling factor.