From the implementation of the AutoInt code, I noticed that the dense features are never passed into the Attention layer. Instead they're simply passed through a simple Feed Forward Network and combined with the sparse features with Attention computed. This is different than the paper describes. Was this intentional?
From the implementation of the AutoInt code, I noticed that the dense features are never passed into the Attention layer. Instead they're simply passed through a simple Feed Forward Network and combined with the sparse features with Attention computed. This is different than the paper describes. Was this intentional?