Padding - Githubissues

rayendito / ScratchLMs

A collection of neural network architectures. Implemented 'from scratch' in PyTorch

MIT License

0 stars 0 forks source link

Padding #3

Open rayendito opened 2 months ago

rayendito commented 2 months ago

so far the input sentences need to be at least context_length. need to have padding to make sure model takes in at least a context_length (esp. GPT)

rayendito commented 1 month ago

fixed in commit 131a9614963402fe0794795159b31571be708b10 turns out i don't really need padding. initially i set the attention mask using register_buffers, which means that if i set context _length = 8, the attention mask will also be of size 8, 8 although the current input is maybe less than that

rayendito commented 1 month ago

turns out i need it for finetuning lol

rayendito commented 1 month ago

2 things needs to be considered:

make sure the padding embedding doesn't get updated (as simple as setting padding_idx in the embedding layer)
mask positions that are padding tokens during attention layer (this one might be alittle bit tricky)

currently, it's only padded (runs, but not right), gonna prioritize on running finetuning parallel data first