microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.98k stars 201 forks source link

Question about RetNetRelPos #80

Closed hyunwoongko closed 8 months ago

hyunwoongko commented 8 months ago

In the retnet code, https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L25 this creates inv_freq (angle in this code) using torch.linspace(0, 1, dim/2).

but generally people creates it using torch.arange(0, dim, 2) / dim like the following: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L120

Why authors implemented inv_freq like that? Is there any special reason?

sunyt32 commented 8 months ago

They are almost the same. It's not a big deal.

hyunwoongko commented 8 months ago

Thanks!