Closed robin-p-schmitt closed 2 years ago
I was not yet able to reproduce the error with a smaller network but maybe the error can still be found.
I don't exactly understand. There are many things which can trivially be removed to make the network smaller. E.g. SpecAugment, making only a single layer for the encoder (or zero layers, remove all the LSTMs), making smaller dimensions, etc. All this steps would already have been helpful.
Edit See how I trivially reduced the test case in #1028. This was just by doing what I described, and also removing unused layers. So all without even thinking much about it. It probably can be reduced much more but at least so far this was really trivial.
Are you working on this now? Edit I think I fixed it. Wait for the PR. Edit See #1028.
How urgent is this to be fixed for your work? Do you found a workaround? Edit Maybe just wait until #1028 is merged.
I realized that the way you use dim tags here is wrong. You are creating multiple dim tags with the same description (e.g. att_heads
) and I think you want them to be equal, but when you creating multiple separate instances, they are in fact not equal. In #1222, this will be fixed, and thus the test case here will also be fixed.
When using the following network dict: https://gist.github.com/robin-p-schmitt/a63bbfd3870935b78c86328a38fae783, I am getting the following error:
The full error log can also be seen in the Gist above.
The error seems to be caused by setting
is_output_layer=True
in the attention weights layer in combination with havingprev:att
as input to thelm
layer. The relevant layers therefore are:In my script, which I use to initialize my network, I call
rnn.engine.init_network_from_config(net_dict_post_proc=net_dict_add_losses)
. The functionnet_dict_add_losses
, despite its name, only setsnet_dict["label_model"]["unit"]["att_weights"]["is_output_layer"] = True
. When looking at the log, one can see that the network construction is working before setting this and only fails in the second pass. Therefore it must have something to do with settingatt_weights
as an output layer.I was not yet able to reproduce the error with a smaller network but maybe the error can still be found.