Closed npwynands closed 4 years ago
Yes, look like this is wrong in RelativePositionalEncodingLayer
. Can you fix it and submit a pull request?
Maybe also add a small test case (it looks like no one cared to add one yet, then we would have noticed earlier).
I would try to fix it, but I’m afraid I wont find any time for this until the beginning of next week. So, somebody else, who is also more experienced in RETURNN than me, might fix this issue faster.
I added 2 test cases, test_RelativePositionalEncodingLayer_in_rec
and test_RelativePositionalEncodingLayer_no_rec
, to test_TFNetworkRecLayer.py
, using the equivalent test cases of PositionalEncodingLayer
as templates (see also test_TFNetworkRecLayer.py
).
Testing RelativePositionalEncodingLayer
with these tests uncovered further issues, which I haven't notice before. The data
returned by get_out_data_from_opts has
has no batch_dim_axes
set, but this is requested under "returnn/tf/layers/base.py", line 310, in _base_get_out_data_from_opts
apparently:
Executing: test_RelativePositionalEncodingLayer_in_rec
{'data': Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])}
layer <network via test_RelativePositionalEncodingLayer_in_rec>/'data' output: Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])
layer <network via test_RelativePositionalEncodingLayer_in_rec>/'data:data' output: Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])
<_SubnetworkRecCell of None>: exception constructing template network (for deps and data shapes)
Most recent construction stack:
<_TemplateLayer 'output/output' uninitialized, construction stack None>, kwargs:
{'_target_layers': {'data': <_TemplateLayer(SourceLayer)(:template:source) 'output/data:data' out_type=Data(shape=(), dtype='int32', sparse=True, dim=5, time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'output')>},
'loss': <CrossEntropyLoss None>,
'n_out': 5,
'name': 'output',
'network': <TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>/output:rec-subnet' parent_net=<TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>' train> train>,
'sources': [<_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>],
'target': 'data'}
Template network so far:
{'data:data': <_TemplateLayer(SourceLayer)(:template:source) 'output/data:data' out_type=Data(shape=(), dtype='int32', sparse=True, dim=5, time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'output')>,
'data:source': <_TemplateLayer(SourceLayer)(:template:source) 'output/data:source' out_type=Data(shape=(), dtype='int32', sparse=True, dim=5, time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'input')>,
'input': <_TemplateLayer(LinearLayer)(:template:linear) 'output/input' out_type=Data(shape=(8,), time_dim_axis=None, batch_shape_meta=[B,F|8]) (construction stack 'pos_enc')>,
'output': <_TemplateLayer 'output/output' uninitialized, construction stack None>,
'pos_enc': <_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>}
Collected (unique) exceptions during template construction:
(Note that many of these can be ignored, or are expected.)
EXCEPTION
Traceback (most recent call last):
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1239, in __call__
line: self.net.construct_layer(
net_dict=self.net_dict, name=name,
get_layer=get_layer, add_layer=get_layer.add_templated_layer)
locals:
self = <local> <_SubnetworkRecCell of None>
self.net = <local> <TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>/output:rec-subnet' parent_net=<TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>' train> train>
self.net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>/output:rec-subnet' parent_net=<TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>' train> train>>
net_dict = <not found>
self.net_dict = <local> {'input': {'class': 'linear', 'activation': None, 'from': 'data:source', 'n_out': 8}, 'pos_enc': {'class': 'relative_positional_encoding', 'from': ['input'], 'n_out': 8}, 'output': {'class': 'softmax', 'from': ['pos_enc'], 'loss': 'ce', 'target': 'data'}}
name = <local> 'output', len = 6
get_layer = <local> <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 2, parents 'output')
add_layer = <not found>
get_layer.add_templated_layer = <local> <bound method _SubnetworkRecCell._construct_template.<locals>.GetLayer.add_templated_layer of <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 2, parents 'output')>
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 672, in construct_layer
line: return add_layer(name=name, layer_class=layer_class, **layer_desc)
locals:
add_layer = <local> <bound method _SubnetworkRecCell._construct_template.<locals>.GetLayer.add_templated_layer of <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 2, parents 'output')>
name = <local> 'output', len = 6
layer_class = <local> <class 'returnn.tf.layers.basic.SoftmaxLayer'>
layer_desc = <local> {'target': 'data', 'sources': [<_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>], '_target_layers': {'data...
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1119, in add_templated_layer
line: output = layer_class.get_out_data_from_opts(**layer_desc)
locals:
output = <not found>
layer_class = <local> <class 'returnn.tf.layers.basic.SoftmaxLayer'>
layer_class.get_out_data_from_opts = <local> <bound method LayerBase.get_out_data_from_opts of <class 'returnn.tf.layers.basic.SoftmaxLayer'>>
layer_desc = <local> {'target': 'data', 'sources': [<_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>], '_target_layers': {'data..., len = 7
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/base.py", line 228, in get_out_data_from_opts
line: return cls._base_get_out_data_from_opts(**kwargs)
locals:
cls = <local> <class 'returnn.tf.layers.basic.SoftmaxLayer'>
cls._base_get_out_data_from_opts = <local> <bound method LayerBase._base_get_out_data_from_opts of <class 'returnn.tf.layers.basic.SoftmaxLayer'>>
kwargs = <local> {'target': 'data', 'sources': [<_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>], '_target_layers': {'data..., len = 7
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/base.py", line 310, in _base_get_out_data_from_opts
line: default_shape.insert(sources_data.batch_dim_axis, None)
locals:
default_shape = <local> [1, None, 8]
default_shape.insert = <local> <built-in method insert of list object at 0x7f0dec4f9a00>
sources_data = <local> Data(name='pos_enc_output', shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8])
sources_data.batch_dim_axis = <local> None
TypeError: 'NoneType' object cannot be interpreted as an integer
TypeError creating layer <network via test_RelativePositionalEncodingLayer_in_rec>/'output' of class RecLayer with opts:
{'_target_layers': {'data': <SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])>},
'n_out': <class 'returnn.util.basic.NotSpecified'>,
'name': 'output',
'network': <TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>' train>,
'sources': [<SourceLayer 'data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])>],
'target': 'data',
'unit': {'input': {'activation': None,
'class': 'linear',
'from': 'data:source',
'n_out': 8},
'output': {'class': 'softmax',
'from': ['pos_enc'],
'loss': 'ce',
'target': 'data'},
'pos_enc': {'class': 'relative_positional_encoding',
'from': ['input'],
'n_out': 8}}}
Traceback (most recent call last):
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/tests/test_TFNetworkRecLayer.py", line 5691, in <module>
globals()[arg]() # assume function and execute
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/tests/test_TFNetworkRecLayer.py", line 5605, in test_RelativePositionalEncodingLayer_in_rec
network.construct_from_dict(net_dict)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 477, in construct_from_dict
self.construct_layer(net_dict, name)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 672, in construct_layer
return add_layer(name=name, layer_class=layer_class, **layer_desc)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 776, in add_layer
layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 717, in _create_layer
layer_desc["output"] = layer_class.get_out_data_from_opts(**layer_desc)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 385, in get_out_data_from_opts
parent_net=network, net_dict=unit, source_data=source_data, rec_layer_name=kwargs["name"])
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 992, in __init__
self._construct_template()
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1280, in _construct_template
get_templated_layer.construct("output")
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1099, in construct
self.__call__(layer_name_)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1266, in __call__
get_layer=default_get_layer, add_layer=default_get_layer.add_templated_layer)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 672, in construct_layer
return add_layer(name=name, layer_class=layer_class, **layer_desc)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1119, in add_templated_layer
output = layer_class.get_out_data_from_opts(**layer_desc)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/base.py", line 228, in get_out_data_from_opts
return cls._base_get_out_data_from_opts(**kwargs)
File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/base.py", line 310, in _base_get_out_data_from_opts
default_shape.insert(sources_data.batch_dim_axis, None)
TypeError: 'NoneType' object cannot be interpreted as an integer
python-BaseException
Registering this axis key manually just lead to mismatches with some placeholder.shape
later. I tried a few data
configurations but none of them worked without errors.
I'm uncertain whether I don't understand the configuration of output data
well enough to fix the issue, or whether it is a more extensive problem.
I pushed my changes to (https://github.com/npwynands/returnn/commit/928ee71f7d08f43daccf27483d940d3ec00d2b8a or more recently https://github.com/npwynands/returnn/commit/a08995b7c8fbfe495ec7dc7fb702963f92185973), which now contains the tests (WIP) and the output data
in its current configuration, plus some commented-out variations I've been trying out.
I'd be grateful, if somebody more experienced in RETURNN would take a look on the issue.
Do not just try around. Rather better understand what is happening.
RelativePositionalEncodingLayer.get_out_data_from_opts
is already correct, or not? Why do you want to change it? Or what do you think is wrong with it?
The error you posted seems to come from SoftmaxLayer
, not from RelativePositionalEncodingLayer
. It looks like SoftmaxLayer
expects that its input has a batch dim. So you could maybe fix SoftmaxLayer
to not expect that.
Or (better) just write the test case in another way, like how you would actually use this layer, like in the example usage, sth like:
network = {
'rel_pos': {
"class": "relative_positional_encoding", "n_out": 13,
"from": "data"},
'output': {
"class": "self_attention",
"num_heads": 3, "total_key_dim": 12, "n_out": 15,
"from": 'data',
"attention_left_only": False,
"key_shift": 'rel_pos'}
}
Ah okay, I didn’t got that the issue comes from the SoftmaxLayer
. In general, I should have thought better about what I was doing, sry. Now, I added a test case based on the example usage, as you suggested. The test passes without errors.
I recently updated my RETURNN/master from commit fa0a7ad8 to e1a9974c (or a728507a). Now I have issues with the
RelativePositionalEncodingLayer
I have been using in my Transformer models:As it can be seen, the
RelativePositionalEncodingLayer
does not pass the (newly introduced ?) sanity check.Environment: Python 3.7.5 TensorFlow 1.14 (cluster) / TensorFlow 1.15 (local) RETURNN as mentioned above