Closed bbc4468 closed 7 years ago
At https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L110 WY = TimeDistributed(Dense(k, W_regularizer=l2(0.01)), name="WY")(Y)
I believe this will use the same dense layer (with same weights) for all timesteps, thus resulting in multiplying Y with the same vector of length k, instead of a matrix of shape k x k.
@shyamupa Can you please check and confirm?
TimeDistributed Implementation https://github.com/fchollet/keras/blob/master/keras/layers/wrappers.py#L91
I got it. Got confused with the DENSE layer. Closing this.
At https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L110 WY = TimeDistributed(Dense(k, W_regularizer=l2(0.01)), name="WY")(Y)
I believe this will use the same dense layer (with same weights) for all timesteps, thus resulting in multiplying Y with the same vector of length k, instead of a matrix of shape k x k.
@shyamupa Can you please check and confirm?
TimeDistributed Implementation https://github.com/fchollet/keras/blob/master/keras/layers/wrappers.py#L91