Open wplf opened 1 year ago
The concept of channel-wise in Conv is quite intuitive. The concept of group-wise is easy to understand, too, which is based on the different attention head. But how to understand channel in tensor with shape [batch_size, num_seq, num_hidden] ?
The concept of channel-wise in Conv is quite intuitive. The concept of group-wise is easy to understand, too, which is based on the different attention head. But how to understand channel in tensor with shape [batch_size, num_seq, num_hidden] ?