rayendito / ScratchLMs

A collection of neural network architectures. Implemented 'from scratch' in PyTorch
MIT License
0 stars 0 forks source link

Padding #3

Open rayendito opened 2 months ago

rayendito commented 2 months ago

so far the input sentences need to be at least context_length. need to have padding to make sure model takes in at least a context_length (esp. GPT)

rayendito commented 1 month ago

fixed in commit 131a9614963402fe0794795159b31571be708b10 turns out i don't really need padding. initially i set the attention mask using register_buffers, which means that if i set context _length = 8, the attention mask will also be of size 8, 8 although the current input is maybe less than that

rayendito commented 1 month ago

turns out i need it for finetuning lol

rayendito commented 1 month ago

2 things needs to be considered:

currently, it's only padded (runs, but not right), gonna prioritize on running finetuning parallel data first