rwth-i6 / returnn

The RWTH extensible training framework for universal recurrent neural networks
http://returnn.readthedocs.io/
Other
349 stars 130 forks source link

RelativePositionalEncodingLayer Does Not Pass Sanity Check in Recent Commits #327

Closed npwynands closed 4 years ago

npwynands commented 4 years ago

I recently updated my RETURNN/master from commit fa0a7ad8 to e1a9974c (or a728507a). Now I have issues with the RelativePositionalEncodingLayer I have been using in my Transformer models:

...
Exception creating layer root/'teacherMT_enc_01_rel_pos' of class RelativePositionalEncodingLayer with opts:
{'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                         "distribution='uniform', scale=0.78)",
 'n_out': 64,
 'name': 'teacherMT_enc_01_rel_pos',
 'network': <TFNetwork 'root' extra_nets={'extra.search': <TFNetwork 'extra.search' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool> search>} train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'output': Data(name='teacherMT_enc_01_rel_pos_output', shape=(None, None, 64), batch_dim_axis=None, time_dim_axis=0, available_for_inference=False, batch_shape_meta=[T|'time:var:extern_data:source_text','time:var:extern_data:source_text',F|64]),
 'sources': [<LayerNormLayer 'teacherMT_enc_01_self_att_laynorm' out_type=Data(shape=(None, 512), available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:source_text',F|512])>],
 'trainable': False}
Traceback (most recent call last):
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/__main__.py", line 642, in main
    execute_main_task()
  ...
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 776, in add_layer
    layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 723, in _create_layer
    output_template.sanity_check(ignore_placeholder=True)  # placeholder might be overwritten later
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/util/data.py", line 623, in sanity_check
    "%s: inconsistent dim. feature axis or unspecified: %r." % (self, self.feature_dim_axis_or_unspecified))
AssertionError: Data(name='teacherMT_enc_01_rel_pos_output', shape=(None, None, 64), batch_dim_axis=None, time_dim_axis=0, available_for_inference=False, batch_shape_meta=[T|'time:var:extern_data:source_text','time:var:extern_data:source_text',F|64]): inconsistent dim. feature axis or unspecified: <class 'returnn.util.basic.NotSpecified'>.
python-BaseException

As it can be seen, the RelativePositionalEncodingLayer does not pass the (newly introduced ?) sanity check.

Environment: Python 3.7.5 TensorFlow 1.14 (cluster) / TensorFlow 1.15 (local) RETURNN as mentioned above

albertz commented 4 years ago

Yes, look like this is wrong in RelativePositionalEncodingLayer. Can you fix it and submit a pull request? Maybe also add a small test case (it looks like no one cared to add one yet, then we would have noticed earlier).

npwynands commented 4 years ago

I would try to fix it, but I’m afraid I wont find any time for this until the beginning of next week. So, somebody else, who is also more experienced in RETURNN than me, might fix this issue faster.

npwynands commented 4 years ago

I added 2 test cases, test_RelativePositionalEncodingLayer_in_rec and test_RelativePositionalEncodingLayer_no_rec, to test_TFNetworkRecLayer.py, using the equivalent test cases of PositionalEncodingLayer as templates (see also test_TFNetworkRecLayer.py). Testing RelativePositionalEncodingLayer with these tests uncovered further issues, which I haven't notice before. The data returned by get_out_data_from_opts has has no batch_dim_axes set, but this is requested under "returnn/tf/layers/base.py", line 310, in _base_get_out_data_from_opts apparently:

Executing: test_RelativePositionalEncodingLayer_in_rec
{'data': Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])}
layer <network via test_RelativePositionalEncodingLayer_in_rec>/'data' output: Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])
layer <network via test_RelativePositionalEncodingLayer_in_rec>/'data:data' output: Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])
<_SubnetworkRecCell of None>: exception constructing template network (for deps and data shapes)
Most recent construction stack:
<_TemplateLayer 'output/output' uninitialized, construction stack None>, kwargs:
{'_target_layers': {'data': <_TemplateLayer(SourceLayer)(:template:source) 'output/data:data' out_type=Data(shape=(), dtype='int32', sparse=True, dim=5, time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'output')>},
 'loss': <CrossEntropyLoss None>,
 'n_out': 5,
 'name': 'output',
 'network': <TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>/output:rec-subnet' parent_net=<TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>' train> train>,
 'sources': [<_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>],
 'target': 'data'}
Template network so far:
{'data:data': <_TemplateLayer(SourceLayer)(:template:source) 'output/data:data' out_type=Data(shape=(), dtype='int32', sparse=True, dim=5, time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'output')>,
 'data:source': <_TemplateLayer(SourceLayer)(:template:source) 'output/data:source' out_type=Data(shape=(), dtype='int32', sparse=True, dim=5, time_dim_axis=None, batch_shape_meta=[B]) (construction stack 'input')>,
 'input': <_TemplateLayer(LinearLayer)(:template:linear) 'output/input' out_type=Data(shape=(8,), time_dim_axis=None, batch_shape_meta=[B,F|8]) (construction stack 'pos_enc')>,
 'output': <_TemplateLayer 'output/output' uninitialized, construction stack None>,
 'pos_enc': <_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>}
Collected (unique) exceptions during template construction:
(Note that many of these can be ignored, or are expected.)
EXCEPTION
Traceback (most recent call last):
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1239, in __call__
    line: self.net.construct_layer(
            net_dict=self.net_dict, name=name,
            get_layer=get_layer, add_layer=get_layer.add_templated_layer)
    locals:
      self = <local> <_SubnetworkRecCell of None>
      self.net = <local> <TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>/output:rec-subnet' parent_net=<TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>' train> train>
      self.net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>/output:rec-subnet' parent_net=<TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>' train> train>>
      net_dict = <not found>
      self.net_dict = <local> {'input': {'class': 'linear', 'activation': None, 'from': 'data:source', 'n_out': 8}, 'pos_enc': {'class': 'relative_positional_encoding', 'from': ['input'], 'n_out': 8}, 'output': {'class': 'softmax', 'from': ['pos_enc'], 'loss': 'ce', 'target': 'data'}}
      name = <local> 'output', len = 6
      get_layer = <local> <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 2, parents 'output')
      add_layer = <not found>
      get_layer.add_templated_layer = <local> <bound method _SubnetworkRecCell._construct_template.<locals>.GetLayer.add_templated_layer of <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 2, parents 'output')>
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 672, in construct_layer
    line: return add_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method _SubnetworkRecCell._construct_template.<locals>.GetLayer.add_templated_layer of <RecLayer construct template GetLayer>(safe False, allow_construct_in_call_nrs None, allow_uninitialized_template False, count 2, parents 'output')>
      name = <local> 'output', len = 6
      layer_class = <local> <class 'returnn.tf.layers.basic.SoftmaxLayer'>
      layer_desc = <local> {'target': 'data', 'sources': [<_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>], '_target_layers': {'data...
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1119, in add_templated_layer
    line: output = layer_class.get_out_data_from_opts(**layer_desc)
    locals:
      output = <not found>
      layer_class = <local> <class 'returnn.tf.layers.basic.SoftmaxLayer'>
      layer_class.get_out_data_from_opts = <local> <bound method LayerBase.get_out_data_from_opts of <class 'returnn.tf.layers.basic.SoftmaxLayer'>>
      layer_desc = <local> {'target': 'data', 'sources': [<_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>], '_target_layers': {'data..., len = 7
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/base.py", line 228, in get_out_data_from_opts
    line: return cls._base_get_out_data_from_opts(**kwargs)
    locals:
      cls = <local> <class 'returnn.tf.layers.basic.SoftmaxLayer'>
      cls._base_get_out_data_from_opts = <local> <bound method LayerBase._base_get_out_data_from_opts of <class 'returnn.tf.layers.basic.SoftmaxLayer'>>
      kwargs = <local> {'target': 'data', 'sources': [<_TemplateLayer(RelativePositionalEncodingLayer)(:template:relative_positional_encoding) 'output/pos_enc' out_type=Data(shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8]) (construction stack 'output')>], '_target_layers': {'data..., len = 7
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/base.py", line 310, in _base_get_out_data_from_opts
    line: default_shape.insert(sources_data.batch_dim_axis, None)
    locals:
      default_shape = <local> [1, None, 8]
      default_shape.insert = <local> <built-in method insert of list object at 0x7f0dec4f9a00>
      sources_data = <local> Data(name='pos_enc_output', shape=(1, None, 8), batch_dim_axis=None, time_dim_axis=None, batch_shape_meta=[1,?,F|8])
      sources_data.batch_dim_axis = <local> None
TypeError: 'NoneType' object cannot be interpreted as an integer
TypeError creating layer <network via test_RelativePositionalEncodingLayer_in_rec>/'output' of class RecLayer with opts:
{'_target_layers': {'data': <SourceLayer 'data:data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])>},
 'n_out': <class 'returnn.util.basic.NotSpecified'>,
 'name': 'output',
 'network': <TFNetwork '<network via test_RelativePositionalEncodingLayer_in_rec>' train>,
 'sources': [<SourceLayer 'data' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=5, batch_shape_meta=[B,T|'time:var:extern_data:data'])>],
 'target': 'data',
 'unit': {'input': {'activation': None,
                    'class': 'linear',
                    'from': 'data:source',
                    'n_out': 8},
          'output': {'class': 'softmax',
                     'from': ['pos_enc'],
                     'loss': 'ce',
                     'target': 'data'},
          'pos_enc': {'class': 'relative_positional_encoding',
                      'from': ['input'],
                      'n_out': 8}}}
Traceback (most recent call last):
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/tests/test_TFNetworkRecLayer.py", line 5691, in <module>
    globals()[arg]()  # assume function and execute
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/tests/test_TFNetworkRecLayer.py", line 5605, in test_RelativePositionalEncodingLayer_in_rec
    network.construct_from_dict(net_dict)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 477, in construct_from_dict
    self.construct_layer(net_dict, name)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 672, in construct_layer
    return add_layer(name=name, layer_class=layer_class, **layer_desc)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 776, in add_layer
    layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 717, in _create_layer
    layer_desc["output"] = layer_class.get_out_data_from_opts(**layer_desc)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 385, in get_out_data_from_opts
    parent_net=network, net_dict=unit, source_data=source_data, rec_layer_name=kwargs["name"])
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 992, in __init__
    self._construct_template()
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1280, in _construct_template
    get_templated_layer.construct("output")
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1099, in construct
    self.__call__(layer_name_)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1266, in __call__
    get_layer=default_get_layer, add_layer=default_get_layer.add_templated_layer)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/network.py", line 672, in construct_layer
    return add_layer(name=name, layer_class=layer_class, **layer_desc)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/rec.py", line 1119, in add_templated_layer
    output = layer_class.get_out_data_from_opts(**layer_desc)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/base.py", line 228, in get_out_data_from_opts
    return cls._base_get_out_data_from_opts(**kwargs)
  File "/home/philipp/Documents/bachelor-thesis/returnn/repository/returnn/tf/layers/base.py", line 310, in _base_get_out_data_from_opts
    default_shape.insert(sources_data.batch_dim_axis, None)
TypeError: 'NoneType' object cannot be interpreted as an integer
python-BaseException

Registering this axis key manually just lead to mismatches with some placeholder.shape later. I tried a few data configurations but none of them worked without errors. I'm uncertain whether I don't understand the configuration of output data well enough to fix the issue, or whether it is a more extensive problem. I pushed my changes to (https://github.com/npwynands/returnn/commit/928ee71f7d08f43daccf27483d940d3ec00d2b8a or more recently https://github.com/npwynands/returnn/commit/a08995b7c8fbfe495ec7dc7fb702963f92185973), which now contains the tests (WIP) and the output data in its current configuration, plus some commented-out variations I've been trying out. I'd be grateful, if somebody more experienced in RETURNN would take a look on the issue.

albertz commented 4 years ago

Do not just try around. Rather better understand what is happening.

RelativePositionalEncodingLayer.get_out_data_from_opts is already correct, or not? Why do you want to change it? Or what do you think is wrong with it?

The error you posted seems to come from SoftmaxLayer, not from RelativePositionalEncodingLayer. It looks like SoftmaxLayer expects that its input has a batch dim. So you could maybe fix SoftmaxLayer to not expect that.

Or (better) just write the test case in another way, like how you would actually use this layer, like in the example usage, sth like:

network = {
  'rel_pos': {
    "class": "relative_positional_encoding", "n_out": 13,
    "from": "data"},
  'output': {
    "class": "self_attention",
    "num_heads": 3, "total_key_dim": 12, "n_out": 15,
    "from": 'data',
    "attention_left_only": False,
    "key_shift": 'rel_pos'}
}
npwynands commented 4 years ago

Ah okay, I didn’t got that the issue comes from the SoftmaxLayer. In general, I should have thought better about what I was doing, sry. Now, I added a test case based on the example usage, as you suggested. The test passes without errors.