I am thinking about the code for Attention. I am wondering what kind of Attention has been used in the model?
For those screenshots, you defined the Attention dense function. Is it soft Attention or hard Attention, or you use dense to simulate the Attention here?
This is soft attention based on a learnable score obtained by att_dense. In this case, the sigmoid activation gives the score between 0 and 1 which corresponds to whether the rule is selected or not.
Hi,
I am thinking about the code for Attention. I am wondering what kind of Attention has been used in the model?
For those screenshots, you defined the Attention dense function. Is it soft Attention or hard Attention, or you use dense to simulate the Attention here?
Thanks.