ufal / neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.
BSD 3-Clause "New" or "Revised" License
410 stars 102 forks source link

Refactor concatenations of "variable" size lists before applying dense layers #744

Open varisd opened 6 years ago

varisd commented 6 years ago

Based on this line of code: https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/decoders/output_projection.py#L125

Current implementation isn't flexible enough; if we train a "submodel" (e.g. decoder without attention - not containing any ctx_tensors) we cannot use the trained variables to initialize model with attention defined because the size of the dense layer matrix input becomes different.

Suggestion: instead of concatenating the list of tensors which can be different between models, define a separate dense matrix for each "additional" list member. Then sum the outputs of the dense layers.

This issue might be related to other parts of the codebase. A proper investigation prior to refactor needs to be done.

jindrahelcl commented 6 years ago

You can always create the matrix manually by stacking your trained matrix, and another, randomly initialized, to match the dimension of the projection matrix of the concatenation.

varisd commented 6 years ago

Yes, I am aware of that, however, that is a bit of a hassle since you need to:

  1. Load the trained model
  2. Modify the desired matrices (and find out which ones)
  3. Save the model again

(Note that the process is not completely model agnostic)

Refactoring would make the implementation more robust while preserving the mathematical properties. We should at least consider this refactor when we are going to release a new stable version (where we can afford not being 100% backward compatible).

jindrahelcl commented 6 years ago

If you do this refactor, your pipeline becomes:

  1. Load the trained model
  2. Create the desired matrices (and find out which ones)
  3. Save the model again

(Note this process is also not completely model agnostic)

Other way to do that would be to initialize the model and

  1. Load the trained model,
  2. ignoring the desired (non-existent) matrices (and find out which ones)

Which is still not fully supported in tf_manager and also, not completely model agnostic. Actually, I don't understand what your "model agnostic" argument means.

varisd commented 6 years ago

I am not sure what are you talking about. I have trained a decoder only model and used it to initialize model extended by the encoder part. As long as the decoder-related variables have the same names, there is no problem. It is true that it is reported during variable initialization that the variables from the extended part are not present/missing, however, they are then randomly initialized and the training loop runs normally.

jindrahelcl commented 6 years ago

I noted some situations in which tf misbehaved and the loading was failing. But it's possible that it's now solved..