why use sigmoid in Attention

zhougr1993 / DeepInterestNetwork

1.62k stars 558 forks source link

why use sigmoid in Attention #80

Open huaweiboy opened 4 years ago

huaweiboy commented 4 years ago

I have question, in paper , you use PRelu/Dice as your attention activation function,but in your code you use sigmoid as your activation function, why do that?

zhougr1993 commented 4 years ago

This implementation is a demo on Amazon data. Activation function for attention weight is neither the contribution of this paper nor important.