关于在训练时使用gumble softmax的细节

icandle / CAMixerSR

CAMixerSR: Only Details Need More “Attention” (CVPR 2024)

https://arxiv.org/abs/2402.19289

Apache License 2.0

229 stars 13 forks source link

关于在训练时使用gumble softmax的细节 #19

Closed wyhhhhhhhh closed 6 months ago

wyhhhhhhhh commented 6 months ago

您好！我对您的工作非常感兴趣，想询问您为什么要在训练时使用gumble softmax，具体是怎么实现的。

icandle commented 6 months ago

因为CAMixer涉及到对不同窗口用卷积或是注意力的划分，而在训练中直接使用softmax或是直接取前k个是不可行的，前者无法表现离散的选择，后者(argmax)存在不可导的问题，因此使用gumblel_softmax来计算一个0-1掩码用于训练，而在推理中我们使用argmax选取前k个窗口计算注意力。更细节的实现你可以参考原论文：Categorical Reparameterization with Gumbel-Softmax

wyhhhhhhhh commented 6 months ago

谢谢！