As we know, in transformer the major part is Q,K,V. This prosess is always generate from queries' input and sequences' input. I'm wondering the following two lines whether right in deepctr.layers.sequence.Transformer (line 534 and 535):
after given tensordot with keys and self.W_key to keys, keys is already changed. Then use keys to calculate values, the values is become input self.W_key self.W_Value, but I think what we need is input self.W_Value.
这里计算keys时候的输入keys是层的输入,然后就覆盖了吧,我的理解是,value的值变成了:input W_key W_Value,而真正应该用的是inputW_value
Should we change the order of these two lines?
这两行代码的顺序应该换一下吧:
As we know, in transformer the major part is Q,K,V. This prosess is always generate from queries' input and sequences' input. I'm wondering the following two lines whether right in deepctr.layers.sequence.Transformer (line 534 and 535):
after given tensordot with keys and self.W_key to keys, keys is already changed. Then use keys to calculate values, the values is become input self.W_key self.W_Value, but I think what we need is input self.W_Value. 这里计算keys时候的输入keys是层的输入,然后就覆盖了吧,我的理解是,value的值变成了:input W_key W_Value,而真正应该用的是inputW_value
Should we change the order of these two lines? 这两行代码的顺序应该换一下吧:
if it's a thick in ctr predict, or my opinion is wrong, pls advise.