macheng6 commented 4 months ago

Your current environment

The output of `python collect_env.py`

I noticed that the implementation of dynamic ntk recalculates the base parameter for all lengths, which is inconsistent with the implementation of transformers?

🐛 Describe the bug

def _compute_cos_sin_cache(self) -> torch.Tensor:

NOTE(woosuk): self.max_position_embeddings is the original

    # maximum length before applying the rope scaling.
    # Thus, the maximum length after applying the rope scaling is
    # self.max_position_embeddings * self.scaling_factor.

    max_len = self.max_position_embeddings * self.scaling_factor
    base = self.base * (
        (self.scaling_factor * max_len / self.max_position_embeddings) -
        (self.scaling_factor - 1))**(self.rotary_dim /
                                     (self.rotary_dim - 2))
    inv_freq = self._compute_inv_freq(base)
    t = torch.arange(max_len, dtype=torch.float)

    freqs = torch.einsum("i,j -> ij", t, inv_freq)
    cos = freqs.cos()
    sin = freqs.sin()
    cache = torch.cat((cos, sin), dim=-1)
    return cache

ymwangg commented 4 months ago

Yes, I noticed similar issue. The current dynamic NTK scaling is actually static NTK scaling. It may be tricky and inefficient to implement dynamic NTK for model server that needs to handle lots of concurrent requests.

Missmiaom commented 2 months ago

+1

qiufengyuyi commented 2 months ago

+1

vllm-project / vllm

[Bug]: The implementation of DynamicNTKScalingRotaryEmbedding may have errors. #5093

Your current environment

🐛 Describe the bug

NOTE(woosuk): self.max_position_embeddings is the original