ufal / neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.
BSD 3-Clause "New" or "Revised" License
410 stars 106 forks source link

Move residual connections from SequenceLabeler to the encoders #789

Open varisd opened 5 years ago

varisd commented 5 years ago

Should the residual connections be really handled inside the SequenceLabeler? Shouldn't the underlying encoder be in charge of that?

Right now, the labeler creates two matrices, enc_out_proj_M and enc_in_proj_M to project both encoder output states and input sequence to the output vocabulary (to later compute the distribution from the logits). So, to get the logits, we compute (enc_out_proj_M * enc_out) + (enc_in_proj_M * enc_in).

If we handle the residual connections in the encoder (optional for the encoder) we can reduce the computation to a single matrix multiplication (enc_out + enc_in) * enc_out_proj_M. This would also simplify the code of the SequenceLabeler and allow us to use any TemporalStateful object as an input (instead of current list of RecurrentEncoder, SentenceEncoder).

This change should not change the gradient flow during training or am I missing something?

jlibovicky commented 5 years ago

To vskutečnosti není residual connection, protože je tam navíc projekce, residual connection by byl jenom součet. Dřív se tomu říkalo skip-connection, dneska se tomu občas dense connection podle DenseNetu. Už to mám v nějaký věvi vyrefaktorovaný pryč. Udělám PR, až se dodělá tf.dataset.