Dropout layer being created on forward pass (in MultiHeadedAttention)

There is one dropout layer being created on the forward pass in the MultiHeadedAttention class (pytorch_widedeep/models/tabular/transformers/_attention_layers.py):

class MultiHeadedAttention(nn.Module):
...
    def forward(self, X_Q: Tensor, X_KV: Optional[Tensor] = None) -> Tensor:
    ...
    self.attn_weights, attn_output = self._standard_attention(q, k, v)
    ...
    def _standard_attention(self, q: Tensor, k: Tensor, v: Tensor) -> Tuple[Tensor, Tensor]:
    ...
    attn_output = einsum(
            "b h s l, b h l d -> b h s d",nn.Dropout(self.dropout)(attn_weights), v  # << HERE
        )

It prevents us from correctly putting the whole model in "eval" mode, because the dropout is always applied. I think we should instantiate the layer in the __init__. I will promptly submit a PR to fix this.

jrzaurin / pytorch-widedeep

Dropout layer being created on forward pass (in MultiHeadedAttention) #189