Closed thisisqiaoqiao closed 7 months ago
We treat the seq as the batch dimension, then we apply the batchformer along the original batch dimension (the first dimension ).
Thank you for your reply. Can you explain in detail the meaning of N and C in (B, N, C)? Does the V2 version have one more batch dimension than the original version? Thank you very much.
Is there a problem here?
It is right. N is the batch dimension. You will calculate bf along the first dimension.
Thank you very much for your reply.
Hello, B in shape (B, N, C) in the picture represents Batchsize. If so, when batch_first= False of TransformerEncoderLayer, the input and output should be (seq, batch, feature), but your code input The size is (batch, seq, feature)