jankrepl / mildlyoverfitted

Paper implementations from scratch and machine learning tutorials
MIT License
340 stars 124 forks source link

Why is the attn_mask a upper triangle matrix? #20

Closed marsggbo closed 10 months ago

marsggbo commented 1 year ago

https://github.com/jankrepl/mildlyoverfitted/blob/94c79838572b1313e4eb3d2622e7da856e1fba73/github_adventures/gpt/model.py#L73-L78

jankrepl commented 1 year ago

@marsggbo thank you for your question.

Because for a given token, we are only allowed to look at the tokens to its left.

jankrepl commented 10 months ago

I am closing this due to inactivity. If you still have any issues feel free to comment and I can reopen:)