Closed tdomhan closed 1 year ago
In the code when is_first_step is True then activate_recurrent is set to False here: https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L362
is_first_step
True
False
I was wondering what the reason for this is? Should one us is_first_step=False when using the recurrent mode of Retnet?
is_first_step=False
At the first step, the input (e.g., a prompt) may contain multiple tokens, while the recurrent mode only accepts one token at a time.
ohh, understood! So it's for prompt. Thanks!
In the code when
is_first_step
isTrue
then activate_recurrent is set toFalse
here: https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L362I was wondering what the reason for this is? Should one us
is_first_step=False
when using the recurrent mode of Retnet?