Implementation of RoPE in YOCO

In this file:

YOCO/yoco/models/decoder/yoco.py

RoPE was implemented as:

    def build_rel_pos(self, x, start_pos):
        if self._precomputed_freqs_cis is None:
            angle = 1.0 / (self.args.rope_theta ** torch.linspace(0, 1, self.head_dim // 2, dtype=torch.float, device=x.device))
            index = torch.arange(self.args.max_seq_len).to(angle)
            self._precomputed_freqs_cis = index[:, None] * angle

        cos = torch.cos(self._precomputed_freqs_cis[start_pos:start_pos+x.size(1)])
        sin = torch.sin(self._precomputed_freqs_cis[start_pos:start_pos+x.size(1)])
        rel_pos = (cos.to(x.dtype), sin.to(x.dtype))
        return rel_pos

I wonder if the angle should be:

angle = 1.0 / (self.args.rope_theta ** torch.linspace(0, 1, self.head_dim // 2 + 1, dtype=torch.float, device=x.device))
angle = angle[:-1]

microsoft / unilm

Implementation of RoPE in YOCO #1554