Closed nkkbr closed 1 month ago
In this file:
YOCO/yoco/models/decoder/yoco.py
RoPE was implemented as:
def build_rel_pos(self, x, start_pos): if self._precomputed_freqs_cis is None: angle = 1.0 / (self.args.rope_theta ** torch.linspace(0, 1, self.head_dim // 2, dtype=torch.float, device=x.device)) index = torch.arange(self.args.max_seq_len).to(angle) self._precomputed_freqs_cis = index[:, None] * angle cos = torch.cos(self._precomputed_freqs_cis[start_pos:start_pos+x.size(1)]) sin = torch.sin(self._precomputed_freqs_cis[start_pos:start_pos+x.size(1)]) rel_pos = (cos.to(x.dtype), sin.to(x.dtype)) return rel_pos
I wonder if the angle should be:
angle = 1.0 / (self.args.rope_theta ** torch.linspace(0, 1, self.head_dim // 2 + 1, dtype=torch.float, device=x.device)) angle = angle[:-1]
In practice, the performance is almost the same between these two implementations. We use torch.linspace for simplicity.
torch.linspace
In this file:
YOCO/yoco/models/decoder/yoco.py
RoPE was implemented as:
I wonder if the angle should be: