Closed hyunwoongko closed 8 months ago
In the retnet code, https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L25 this creates inv_freq (angle in this code) using torch.linspace(0, 1, dim/2).
inv_freq
angle
torch.linspace(0, 1, dim/2)
but generally people creates it using torch.arange(0, dim, 2) / dim like the following: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L120
torch.arange(0, dim, 2) / dim
Why authors implemented inv_freq like that? Is there any special reason?
They are almost the same. It's not a big deal.
Thanks!
In the retnet code, https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L25 this creates
inv_freq
(angle
in this code) usingtorch.linspace(0, 1, dim/2)
.but generally people creates it using
torch.arange(0, dim, 2) / dim
like the following: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L120Why authors implemented inv_freq like that? Is there any special reason?