Open mymuli opened 3 years ago
May I ask, it is mentioned in formula (12) that the method of upsampling is used, which method is used specifically?
In implementation, the upsampling operation can be omitted if we just downsample the key tensor and the value tensor but keep the shape of the query tensor unchanged.
May I ask, it is mentioned in formula (12) that the method of upsampling is used, which method is used specifically?