Closed xinqipony closed 4 years ago
Hi! The reason why seq_len
is variable length is because the way I implemented it, the learned E
and F
matrices are nn.Linear
layers, which have to know their sequence length beforehand. To that end, however, I created the Padder
class, which automatically pads your input sequence if it happens to be of a length other than input_size
. To use it in your code, use it like here
great work! I noticed the linformer input is (batch_size, seq_len, channels), can seq_len be variable length or should the attention be masked if seq_len is padded? why seq_len is a fixed length?