jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models
MIT License
1.32k stars 115 forks source link

Questions about DynamicNTK #53

Open wutong4012 opened 6 months ago

wutong4012 commented 6 months ago

https://github.com/jquesnelle/yarn/blob/ff9321faf940f92023a4f04cb09852ae18cbbf27/scaled_rope/modeling_llama_yarn.py#L214

Please tell me here, if I want to expand from 2K to 16K, then the factor multiplied by the base here is $(8 * 16K / 2K) - (8 - 1) = 57$, Is this multiple reasonable? Are there some problems here? Please correct me if I'm wrong.

@bloc97