Closed bruce2233 closed 2 months ago
is_causal=True applies the causal masking. "attn_mask" in the function allows user to pass user-defined attention masking. Users use one of them to set the masking. You will get an error if both are set.
What this repo implemented is a decoder-only transformer with causal masking. If no mask is applied, it's an encoder-only transformer.
is_causal=True applies the causal masking. "attn_mask" in the function allows user to pass user-defined attention masking. Users use one of them to set the masking. You will get an error if both are set.
What this repo implemented is a decoder-only transformer with causal masking. If no mask is applied, it's an encoder-only transformer.
Thanks, I get it. https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html#
https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436728a420e85796c57dba7e/model.py#L61-L71
If so, is GPT a decoder-only transformer?