rwth-i6 / pytorch-to-returnn

Make PyTorch code runnable within RETURNN
3 stars 6 forks source link

Merge batch with modified time dim #96

Closed vieting closed 2 years ago

vieting commented 2 years ago

In a case where the time dim is modified (e.g. due to downsampling/padding in a convolution) and we want to merge it with the batch dim, we currently face the problem that the creation of a PaddedDim in the MergeDimsLayer does not work because it is not possible to get the dim value for the modified time dim.

A demo test case which reproduces the issue looks like this and can also be found in #97:

def test_merge_batch_with_modified_time():
  n_in, n_out = 5, 7
  n_batch, n_time = 3, 11

  def model_func(wrapped_import, inputs: torch.Tensor):
    if typing.TYPE_CHECKING or not wrapped_import:
      import torch
    else:
      torch = wrapped_import("torch")

    conv = torch.nn.Conv1d(in_channels=n_in, out_channels=n_out, kernel_size=3, stride=2)
    y = inputs  # (B,F,T)
    y = conv(y)  # (B,F',T')
    y = y.transpose(1, 2).contiguous()  # (B,T',F')
    _, _, fsz = y.shape
    y = y.view(-1, fsz)  # (B*T',F')
    return y

  rnd = numpy.random.RandomState(42)
  x = rnd.normal(0., 1., (n_batch, n_in, n_time)).astype("float32")
  verify_torch_and_convert_to_returnn(model_func, inputs=x)

The stack trace looks like this (via):

ERROR: test_layers.test_merge_batch_with_modified_time
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.8/site-packages/nose/case.py", line 198, in TestBase.runTest
    line: self.test(*self.arg)
    locals:
      self = <local> test_layers.test_merge_batch_with_modified_time
      self.test = <local> <function test_merge_batch_with_modified_time at 0x7f3ba0138a60>
      self.arg = <local> ()
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/tests/test_layers.py", line 1215, in test_merge_batch_with_modified_time
    line: verify_torch_and_convert_to_returnn(model_func, inputs=x)
    locals:
      verify_torch_and_convert_to_returnn = <global> <function verify_torch_and_convert_to_returnn at 0x7f3ba013f5e0>
      model_func = <local> <function test_merge_batch_with_modified_time.<locals>.model_func at 0x7f3b40211d30>
      inputs = <not found>
      x = <local> array([[[ 0.49671414, -0.1382643 ,  0.64768857,  1.5230298 ,
                           -0.23415338, -0.23413695,  1.5792128 ,  0.7674347 ,
                           -0.46947438,  0.54256004, -0.46341768],
                          [-0.46572974,  0.24196227, -1.9132802 , -1.7249179 ,
                           -0.5622875 , -1.0128311 ,  0.31424734, -0.9080241 ,
                      ..., len = 3, _[0]: {len = 5, _[0]: {len = 11}}
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 403, in verify_torch_and_convert_to_returnn
    line: converter.run()
    locals:
      converter = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
      converter.run = <local> <bound method Converter.run of <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 139, in Converter.run
    line: self._run_torch_returnn_drop_in()
    locals:
      self = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
      self._run_torch_returnn_drop_in = <local> <bound method Converter._run_torch_returnn_drop_in of <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 254, in Converter._run_torch_returnn_drop_in
    line: out_returnn = self._model_func(wrapped_import_torch_returnn, in_returnn)
    locals:
      out_returnn = <not found>
      self = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
      self._model_func = <local> <function test_merge_batch_with_modified_time.<locals>.model_func at 0x7f3b40211d30>
      wrapped_import_torch_returnn = <global> <function wrapped_import_torch_returnn at 0x7f3ba013f550>
      in_returnn = <local> <Tensor name:? tensor:(B(3),F'feature:data'(5)(5),'time:data'[B](11)) returnn_data:'data' [B,F|F'feature:data'(5),T|'time:data'[B]] axes id>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/tests/test_layers.py", line 1210, in test_merge_batch_with_modified_time.<locals>.model_func
    line: y = y.view(-1, fsz)  # (B*T',F')
    locals:
      y = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(7)] axes id>
      y.view = <local> <bound method Tensor.view of <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>>
      fsz = <local> 'static_dim'(7)(7)
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/tensor.py", line 117, in Tensor.view
    line: return reshape(self, shape)
    locals:
      reshape = <local> <function reshape at 0x7f3bb4e19790>
      self = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>
      shape = <local> (-1, 'static_dim'(7)(7))
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/functional.py", line 287, in reshape
    line: input = modules.Flatten(start_dim=axis1, end_dim=a - 1).as_returnn_torch_functional()(input)
    locals:
      input = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>
      modules = <global> <module 'pytorch_to_returnn.torch.nn.modules' from '/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/__init__.py'>
      modules.Flatten = <global> <class 'pytorch_to_returnn.torch.nn.modules.shape.Flatten'>
      start_dim = <not found>
      axis1 = <local> 0
      end_dim = <not found>
      a = <local> 2
      as_returnn_torch_functional = <not found>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/module.py", line 439, in Module.__call__
    line: res = call_entry.apply_call()
    locals:
      res = <not found>
      call_entry = <local> <CallEntry 'Flatten' <ModuleEntry <Flatten>> (depth 0)>
      call_entry.apply_call = <local> <bound method CallEntry.apply_call of <CallEntry 'Flatten' <ModuleEntry <Flatten>> (depth 0)>>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/naming/call.py", line 137, in CallEntry.apply_call
    line: layer = returnn_net.construct_layer(net_dict={layer_name: layer_dict}, name=layer_name)
    locals:
      layer = <not found>
      returnn_net = <local> <TFNetwork 'root' train=False>
      returnn_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=False>>
      net_dict = <not found>
      layer_name = <local> 'Flatten', len = 7
      layer_dict = <local> {'class': 'merge_dims', 'from': 'Transpose', 'axes': ['B', 'T'], 'keep_order': True}
      name = <not found>
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 942, in TFNetwork.construct_layer
    line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=False>>
      name = <local> 'Flatten', len = 7
      name_with_prefix = <local> 'Flatten', len = 7
      layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
      layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>]}
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 1089, in TFNetwork.add_layer
    line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      layer = <not found>
      self = <local> <TFNetwork 'root' train=False>
      self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root' train=False>>
      name = <local> 'Flatten', len = 7
      layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
      layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>]}
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 991, in TFNetwork._create_layer
    line: layer_desc["output"] = layer_class.get_out_data_from_opts(**layer_desc)
    locals:
      layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>], 'name': 'Flatten', 'network': <TFNetwork 'root' train=False>}, len = 7
      layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
      layer_class.get_out_data_from_opts = <local> <bound method MergeDimsLayer.get_out_data_from_opts of <class 'returnn.tf.layers.basic.MergeDimsLayer'>>
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/layers/basic.py", line 3210, in MergeDimsLayer.get_out_data_from_opts
    line: data.batch = data.batch.copy_extend_with_padded_or_fixed_dim_tag(
            dim_tag=input_data.get_dim_tag(axis),
            batch_major=(axis > input_data.batch_dim_axis) if keep_order else True)
    locals:
      data = <local> Data{'Flatten_output', [B,F|F'Conv1d:channel'(7)]}
      data.batch = <local> BatchInfo{B}
      data.batch.copy_extend_with_padded_or_fixed_dim_tag = <local> <bound method BatchInfo.copy_extend_with_padded_or_fixed_dim_tag of BatchInfo{B}>
      dim_tag = <not found>
      input_data = <local> Data{'Transpose_output', [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}
      input_data.get_dim_tag = <local> <bound method Data.get_dim_tag of Data{'Transpose_output', [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>
      axis = <local> 1
      batch_major = <not found>
      input_data.batch_dim_axis = <local> 0
      keep_order = <local> True
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 2278, in BatchInfo.copy_extend_with_padded_or_fixed_dim_tag
    line: new_dim = self._make_padded_dim(dim_tag)
    locals:
      new_dim = <not found>
      self = <local> BatchInfo{B}
      self._make_padded_dim = <local> <bound method BatchInfo._make_padded_dim of BatchInfo{B}>
      dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 2182, in BatchInfo._make_padded_dim
    line: new_dim = BatchInfo.PaddedDim(dim_tag=dim_tag_base)
    locals:
      new_dim = <not found>
      BatchInfo = <global> <class 'returnn.tf.util.data.BatchInfo'>
      BatchInfo.PaddedDim = <global> <class 'returnn.tf.util.data.BatchInfo.PaddedDim'>
      dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
      dim_tag_base = <local> Dim{'Conv1d:conv:s0'[?]}
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 1860, in BatchInfo.PaddedDim.__init__
    line: super(BatchInfo.PaddedDim, self).__init__(size=dim_tag.get_dim_value())
    locals:
      super = <builtin> <class 'super'>
      BatchInfo = <global> <class 'returnn.tf.util.data.BatchInfo'>
      BatchInfo.PaddedDim = <global> <class 'returnn.tf.util.data.BatchInfo.PaddedDim'>
      self = <local> !AttributeError: 'PaddedDim' object has no attribute 'dim_tag'
      __init__ = <not found>
      size = <not found>
      dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
      dim_tag.get_dim_value = <local> <bound method Dim.get_dim_value of Dim{'Conv1d:conv:s0'[?]}>
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 954, in Dim.get_dim_value
    line: raise Exception('%s: need placeholder, self.dimension or self.dyn_size for dim value' % self)
    locals:
      Exception = <builtin> <class 'Exception'>
      self = <local> Dim{'Conv1d:conv:s0'[?]}
Exception: Dim{'Conv1d:conv:s0'[?]}: need placeholder, self.dimension or self.dyn_size for dim value
albertz commented 2 years ago

This is a bit too less information. What exactly fails, with what error? MergeDimsLayer fails? What layer opts, and what inputs?

You should post a test case here, together with the error.

vieting commented 2 years ago

See #97. I copy the stack trace here for reference:

ERROR: test_layers.test_merge_batch_with_modified_time
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.8/site-packages/nose/case.py", line 198, in TestBase.runTest
    line: self.test(*self.arg)
    locals:
      self = <local> test_layers.test_merge_batch_with_modified_time
      self.test = <local> <function test_merge_batch_with_modified_time at 0x7f3ba0138a60>
      self.arg = <local> ()
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/tests/test_layers.py", line 1215, in test_merge_batch_with_modified_time
    line: verify_torch_and_convert_to_returnn(model_func, inputs=x)
    locals:
      verify_torch_and_convert_to_returnn = <global> <function verify_torch_and_convert_to_returnn at 0x7f3ba013f5e0>
      model_func = <local> <function test_merge_batch_with_modified_time.<locals>.model_func at 0x7f3b40211d30>
      inputs = <not found>
      x = <local> array([[[ 0.49671414, -0.1382643 ,  0.64768857,  1.5230298 ,
                           -0.23415338, -0.23413695,  1.5792128 ,  0.7674347 ,
                           -0.46947438,  0.54256004, -0.46341768],
                          [-0.46572974,  0.24196227, -1.9132802 , -1.7249179 ,
                           -0.5622875 , -1.0128311 ,  0.31424734, -0.9080241 ,
                      ..., len = 3, _[0]: {len = 5, _[0]: {len = 11}}
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 403, in verify_torch_and_convert_to_returnn
    line: converter.run()
    locals:
      converter = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
      converter.run = <local> <bound method Converter.run of <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 139, in Converter.run
    line: self._run_torch_returnn_drop_in()
    locals:
      self = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
      self._run_torch_returnn_drop_in = <local> <bound method Converter._run_torch_returnn_drop_in of <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 254, in Converter._run_torch_returnn_drop_in
    line: out_returnn = self._model_func(wrapped_import_torch_returnn, in_returnn)
    locals:
      out_returnn = <not found>
      self = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
      self._model_func = <local> <function test_merge_batch_with_modified_time.<locals>.model_func at 0x7f3b40211d30>
      wrapped_import_torch_returnn = <global> <function wrapped_import_torch_returnn at 0x7f3ba013f550>
      in_returnn = <local> <Tensor name:? tensor:(B(3),F'feature:data'(5)(5),'time:data'[B](11)) returnn_data:'data' [B,F|F'feature:data'(5),T|'time:data'[B]] axes id>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/tests/test_layers.py", line 1210, in test_merge_batch_with_modified_time.<locals>.model_func
    line: y = y.view(-1, fsz)  # (B*T',F')
    locals:
      y = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(7)] axes id>
      y.view = <local> <bound method Tensor.view of <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>>
      fsz = <local> 'static_dim'(7)(7)
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/tensor.py", line 117, in Tensor.view
    line: return reshape(self, shape)
    locals:
      reshape = <local> <function reshape at 0x7f3bb4e19790>
      self = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>
      shape = <local> (-1, 'static_dim'(7)(7))
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/functional.py", line 287, in reshape
    line: input = modules.Flatten(start_dim=axis1, end_dim=a - 1).as_returnn_torch_functional()(input)
    locals:
      input = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>
      modules = <global> <module 'pytorch_to_returnn.torch.nn.modules' from '/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/__init__.py'>
      modules.Flatten = <global> <class 'pytorch_to_returnn.torch.nn.modules.shape.Flatten'>
      start_dim = <not found>
      axis1 = <local> 0
      end_dim = <not found>
      a = <local> 2
      as_returnn_torch_functional = <not found>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/module.py", line 439, in Module.__call__
    line: res = call_entry.apply_call()
    locals:
      res = <not found>
      call_entry = <local> <CallEntry 'Flatten' <ModuleEntry <Flatten>> (depth 0)>
      call_entry.apply_call = <local> <bound method CallEntry.apply_call of <CallEntry 'Flatten' <ModuleEntry <Flatten>> (depth 0)>>
  File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/naming/call.py", line 137, in CallEntry.apply_call
    line: layer = returnn_net.construct_layer(net_dict={layer_name: layer_dict}, name=layer_name)
    locals:
      layer = <not found>
      returnn_net = <local> <TFNetwork 'root' train=False>
      returnn_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=False>>
      net_dict = <not found>
      layer_name = <local> 'Flatten', len = 7
      layer_dict = <local> {'class': 'merge_dims', 'from': 'Transpose', 'axes': ['B', 'T'], 'keep_order': True}
      name = <not found>
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 942, in TFNetwork.construct_layer
    line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=False>>
      name = <local> 'Flatten', len = 7
      name_with_prefix = <local> 'Flatten', len = 7
      layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
      layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>]}
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 1089, in TFNetwork.add_layer
    line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      layer = <not found>
      self = <local> <TFNetwork 'root' train=False>
      self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root' train=False>>
      name = <local> 'Flatten', len = 7
      layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
      layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>]}
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 991, in TFNetwork._create_layer
    line: layer_desc["output"] = layer_class.get_out_data_from_opts(**layer_desc)
    locals:
      layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>], 'name': 'Flatten', 'network': <TFNetwork 'root' train=False>}, len = 7
      layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
      layer_class.get_out_data_from_opts = <local> <bound method MergeDimsLayer.get_out_data_from_opts of <class 'returnn.tf.layers.basic.MergeDimsLayer'>>
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/layers/basic.py", line 3210, in MergeDimsLayer.get_out_data_from_opts
    line: data.batch = data.batch.copy_extend_with_padded_or_fixed_dim_tag(
            dim_tag=input_data.get_dim_tag(axis),
            batch_major=(axis > input_data.batch_dim_axis) if keep_order else True)
    locals:
      data = <local> Data{'Flatten_output', [B,F|F'Conv1d:channel'(7)]}
      data.batch = <local> BatchInfo{B}
      data.batch.copy_extend_with_padded_or_fixed_dim_tag = <local> <bound method BatchInfo.copy_extend_with_padded_or_fixed_dim_tag of BatchInfo{B}>
      dim_tag = <not found>
      input_data = <local> Data{'Transpose_output', [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}
      input_data.get_dim_tag = <local> <bound method Data.get_dim_tag of Data{'Transpose_output', [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>
      axis = <local> 1
      batch_major = <not found>
      input_data.batch_dim_axis = <local> 0
      keep_order = <local> True
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 2278, in BatchInfo.copy_extend_with_padded_or_fixed_dim_tag
    line: new_dim = self._make_padded_dim(dim_tag)
    locals:
      new_dim = <not found>
      self = <local> BatchInfo{B}
      self._make_padded_dim = <local> <bound method BatchInfo._make_padded_dim of BatchInfo{B}>
      dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 2182, in BatchInfo._make_padded_dim
    line: new_dim = BatchInfo.PaddedDim(dim_tag=dim_tag_base)
    locals:
      new_dim = <not found>
      BatchInfo = <global> <class 'returnn.tf.util.data.BatchInfo'>
      BatchInfo.PaddedDim = <global> <class 'returnn.tf.util.data.BatchInfo.PaddedDim'>
      dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
      dim_tag_base = <local> Dim{'Conv1d:conv:s0'[?]}
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 1860, in BatchInfo.PaddedDim.__init__
    line: super(BatchInfo.PaddedDim, self).__init__(size=dim_tag.get_dim_value())
    locals:
      super = <builtin> <class 'super'>
      BatchInfo = <global> <class 'returnn.tf.util.data.BatchInfo'>
      BatchInfo.PaddedDim = <global> <class 'returnn.tf.util.data.BatchInfo.PaddedDim'>
      self = <local> !AttributeError: 'PaddedDim' object has no attribute 'dim_tag'
      __init__ = <not found>
      size = <not found>
      dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
      dim_tag.get_dim_value = <local> <bound method Dim.get_dim_value of Dim{'Conv1d:conv:s0'[?]}>
  File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 954, in Dim.get_dim_value
    line: raise Exception('%s: need placeholder, self.dimension or self.dyn_size for dim value' % self)
    locals:
      Exception = <builtin> <class 'Exception'>
      self = <local> Dim{'Conv1d:conv:s0'[?]}
Exception: Dim{'Conv1d:conv:s0'[?]}: need placeholder, self.dimension or self.dyn_size for dim value
albertz commented 2 years ago

See #97

Yea it's helpful that you already started a draft PR with the test case, but this issue description should also contain the description about the actual problem, and the test case is actual most helpful to describe the problem.

albertz commented 2 years ago
    line: y = y.view(-1, fsz)  # (B*T',F')
    locals:
      y = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(7)] axes id>

Here the static_dim(5) already looks wrong. It should not be a static dim. Maybe this the actual problem?

vieting commented 2 years ago

Yea it's helpful that you already started a draft PR with the test case, but this issue description should also contain the description about the actual problem, and the test case is actual most helpful to describe the problem.

Yes, I mean I opened the issue, then opened the PR and wanted to copy the stack trace to show the issue. Your first comment was very fast, so that was not done yet. What else should I write into the description? Open the PR first and directly mention it in the description?

vieting commented 2 years ago
    line: y = y.view(-1, fsz)  # (B*T',F')
    locals:
      y = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(7)] axes id>

Here the static_dim(5) already looks wrong. It should not be a static dim. Maybe this the actual problem?

Yes, I noticed this as well, but this is not the actual problem. If the time dim is not modified (as done by the convolution here), this entry is also false but the test passes.

albertz commented 2 years ago
    line: input = modules.Flatten(start_dim=axis1, end_dim=a - 1).as_returnn_torch_functional()(input)
    locals:
      input = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>

This also looks strange that it is 'Conv1d:conv:s0'[?] now. It should be 'Conv1d:conv:s0'[B].

albertz commented 2 years ago

Yea it's helpful that you already started a draft PR with the test case, but this issue description should also contain the description about the actual problem, and the test case is actual most helpful to describe the problem.

Yes, I mean I opened the issue, then opened the PR and wanted to copy the stack trace to show the issue. Your first comment was very fast, so that was not done yet. What else should I write into the description? Open the PR first and directly mention it in the description?

The description itself should ideally contain a demo code (test case) + error (including stack trace).

albertz commented 2 years ago

When you debug-step through it, starting at y = y.view(-1, fsz), initially it shows 'Conv1d:conv:s0'[B] for y but at some point it becomes 'Conv1d:conv:s0'[?]. Where is that?

vieting commented 2 years ago

When you debug-step through it, starting at y = y.view(-1, fsz), initially it shows 'Conv1d:conv:s0'[B] for y but at some point it becomes 'Conv1d:conv:s0'[?]. Where is that?

It's in BatchInfo._make_padded_dim() in the line dim_tag_base = dim_tag.get_same_base(). I have to check why dim_tag.same_as is 'Conv1d:conv:s0'[?].

albertz commented 2 years ago

When you debug-step through it, starting at y = y.view(-1, fsz), initially it shows 'Conv1d:conv:s0'[B] for y but at some point it becomes 'Conv1d:conv:s0'[?]. Where is that?

It's in BatchInfo._make_padded_dim() in the line dim_tag_base = dim_tag.get_same_base().

Can you say more about the stack trace to get there?

I have to check why dim_tag.same_as is 'Conv1d:conv:s0'[?].

Well, this should be ok. This is not a problem. But the Data instance should call Dim.get_for_batch_ctx via Data._adapt_batch_consistent_dim_tags and that should resolve it again.

Edit Sorry, not the Data instance in this case but BatchInfo._make_padded_dim or whatever else is using it.

albertz commented 2 years ago

It sounds like this is a RETURNN bug? Can you reproduce a pure RETURNN net dict which has this problem?

vieting commented 2 years ago

It sounds like this is a RETURNN bug? Can you reproduce a pure RETURNN net dict which has this problem?

Yes, I did. See the corresponding issue (https://github.com/rwth-i6/returnn/issues/917) and PR with test case.

albertz commented 2 years ago

Fixed via https://github.com/rwth-i6/returnn/pull/916.