Closed yu-changqian closed 6 years ago
I find in the Base_OC_Module there is a magic code snippet sim_map = (self.key_channels**-.5) * sim_map, what's the meaning of it? And why?
sim_map = (self.key_channels**-.5) * sim_map
@ycszen Please check the self-attention paper.
They call such operation scaled-product attention.
I find in the Base_OC_Module there is a magic code snippet
sim_map = (self.key_channels**-.5) * sim_map
, what's the meaning of it? And why?