Open rayendito opened 2 months ago
fixed in commit 131a9614963402fe0794795159b31571be708b10
turns out i don't really need padding. initially i set the attention mask using register_buffers, which means that if i set context _length = 8
, the attention mask will also be of size 8, 8
although the current input is maybe less than that
turns out i need it for finetuning lol
2 things needs to be considered:
padding_idx
in the embedding layer)currently, it's only padded (runs, but not right), gonna prioritize on running finetuning parallel data first
so far the input sentences need to be at least context_length. need to have padding to make sure model takes in at least a context_length (esp. GPT)