microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
3.01k stars 202 forks source link

Question about is_first_step and Retnet #58

Closed tdomhan closed 1 year ago

tdomhan commented 1 year ago

In the code when is_first_step is True then activate_recurrent is set to False here: https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L362

I was wondering what the reason for this is? Should one us is_first_step=False when using the recurrent mode of Retnet?

shumingma commented 1 year ago

At the first step, the input (e.g., a prompt) may contain multiple tokens, while the recurrent mode only accepts one token at a time.

tdomhan commented 1 year ago

ohh, understood! So it's for prompt. Thanks!