There is one dropout layer being created on the forward pass in the MultiHeadedAttention class (pytorch_widedeep/models/tabular/transformers/_attention_layers.py):
class MultiHeadedAttention(nn.Module):
...
def forward(self, X_Q: Tensor, X_KV: Optional[Tensor] = None) -> Tensor:
...
self.attn_weights, attn_output = self._standard_attention(q, k, v)
...
def _standard_attention(self, q: Tensor, k: Tensor, v: Tensor) -> Tuple[Tensor, Tensor]:
...
attn_output = einsum(
"b h s l, b h l d -> b h s d",nn.Dropout(self.dropout)(attn_weights), v # << HERE
)
It prevents us from correctly putting the whole model in "eval" mode, because the dropout is always applied. I think we should instantiate the layer in the __init__.
I will promptly submit a PR to fix this.
There is one dropout layer being created on the forward pass in the MultiHeadedAttention class (pytorch_widedeep/models/tabular/transformers/_attention_layers.py):
It prevents us from correctly putting the whole model in "eval" mode, because the dropout is always applied. I think we should instantiate the layer in the
__init__
. I will promptly submit a PR to fix this.