In FINAL, I would like to ask if the formula in the paper is consistent with the FactorizedInteraction in the code, where hl,2 = hl,1 ⊙ σ(Wl,2xl−1 + bl,2
Is hl,1 and hl,1 = Wl,1xl−1 + bl,1 consistent? And when the final indicator is obtained, is there only one layer in block2? This is very important to me, please see the answer, thank you
In FINAL, I would like to ask if the formula in the paper is consistent with the FactorizedInteraction in the code, where hl,2 = hl,1 ⊙ σ(Wl,2xl−1 + bl,2 Is hl,1 and hl,1 = Wl,1xl−1 + bl,1 consistent? And when the final indicator is obtained, is there only one layer in block2? This is very important to me, please see the answer, thank you