Open varisd opened 6 years ago
You can always create the matrix manually by stacking your trained matrix, and another, randomly initialized, to match the dimension of the projection matrix of the concatenation.
Yes, I am aware of that, however, that is a bit of a hassle since you need to:
(Note that the process is not completely model agnostic)
Refactoring would make the implementation more robust while preserving the mathematical properties. We should at least consider this refactor when we are going to release a new stable version (where we can afford not being 100% backward compatible).
If you do this refactor, your pipeline becomes:
(Note this process is also not completely model agnostic)
Other way to do that would be to initialize the model and
Which is still not fully supported in tf_manager
and also, not completely model agnostic. Actually, I don't understand what your "model agnostic" argument means.
I am not sure what are you talking about. I have trained a decoder only model and used it to initialize model extended by the encoder part. As long as the decoder-related variables have the same names, there is no problem. It is true that it is reported during variable initialization that the variables from the extended part are not present/missing, however, they are then randomly initialized and the training loop runs normally.
I noted some situations in which tf misbehaved and the loading was failing. But it's possible that it's now solved..
Based on this line of code: https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/decoders/output_projection.py#L125
Current implementation isn't flexible enough; if we train a "submodel" (e.g. decoder without attention - not containing any ctx_tensors) we cannot use the trained variables to initialize model with attention defined because the size of the dense layer matrix input becomes different.
Suggestion: instead of concatenating the list of tensors which can be different between models, define a separate dense matrix for each "additional" list member. Then sum the outputs of the dense layers.
This issue might be related to other parts of the codebase. A proper investigation prior to refactor needs to be done.