Closed pp00704831 closed 3 years ago
We cannot directly train a model with the original MHSA. As described in the introduction of our paper, the computational cost and memory usage of original MHSA is quadratic to the image size. The memory usage can even exceed to the NVIDIA A100 limit. Maybe original MHSA can achieve better result, but it is impossible to implement it under the current hardware situation. BTW, for comparison, we replace P-MHSA with a modified MHSA that uses average pooling operations for generating k and v, and the pooling ratios are the smallest one in the P-MHSA.
Thank you for your reply. So how about the strides for pooling? For example, when pooling ratio equals to 12, it means that you use kernel size = 12, and strides = 12 ?
Thank you for your reply. So how about the strides for pooling? For example, when pooling ratio equals to 12, it means that you use kernel size = 12, and strides = 12 ?
Almost right. However, this may cause that some pixels are not taken into account without padding. Regarding this issue, we use adaptive kernel size, stride, and padding to make sure that all pixels are for computation.
Hello,
I have some questions about your ablation studies of pyramid pooling. Could you detail about your baseline version in Table 9? First, you say that you replace P-MHSA with an MHSA with a single pooling operation, what is the detail about single pooling operation? Ex: Pooling Ratios? Second, do you compared your method with original MHSA?