Closed vieting closed 2 years ago
This is a bit too less information. What exactly fails, with what error? MergeDimsLayer
fails? What layer opts, and what inputs?
You should post a test case here, together with the error.
See #97. I copy the stack trace here for reference:
ERROR: test_layers.test_merge_batch_with_modified_time
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/.local/lib/python3.8/site-packages/nose/case.py", line 198, in TestBase.runTest
line: self.test(*self.arg)
locals:
self = <local> test_layers.test_merge_batch_with_modified_time
self.test = <local> <function test_merge_batch_with_modified_time at 0x7f3ba0138a60>
self.arg = <local> ()
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/tests/test_layers.py", line 1215, in test_merge_batch_with_modified_time
line: verify_torch_and_convert_to_returnn(model_func, inputs=x)
locals:
verify_torch_and_convert_to_returnn = <global> <function verify_torch_and_convert_to_returnn at 0x7f3ba013f5e0>
model_func = <local> <function test_merge_batch_with_modified_time.<locals>.model_func at 0x7f3b40211d30>
inputs = <not found>
x = <local> array([[[ 0.49671414, -0.1382643 , 0.64768857, 1.5230298 ,
-0.23415338, -0.23413695, 1.5792128 , 0.7674347 ,
-0.46947438, 0.54256004, -0.46341768],
[-0.46572974, 0.24196227, -1.9132802 , -1.7249179 ,
-0.5622875 , -1.0128311 , 0.31424734, -0.9080241 ,
..., len = 3, _[0]: {len = 5, _[0]: {len = 11}}
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 403, in verify_torch_and_convert_to_returnn
line: converter.run()
locals:
converter = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
converter.run = <local> <bound method Converter.run of <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>>
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 139, in Converter.run
line: self._run_torch_returnn_drop_in()
locals:
self = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
self._run_torch_returnn_drop_in = <local> <bound method Converter._run_torch_returnn_drop_in of <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>>
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/converter/converter.py", line 254, in Converter._run_torch_returnn_drop_in
line: out_returnn = self._model_func(wrapped_import_torch_returnn, in_returnn)
locals:
out_returnn = <not found>
self = <local> <pytorch_to_returnn.converter.converter.Converter object at 0x7f3b403a3e80>
self._model_func = <local> <function test_merge_batch_with_modified_time.<locals>.model_func at 0x7f3b40211d30>
wrapped_import_torch_returnn = <global> <function wrapped_import_torch_returnn at 0x7f3ba013f550>
in_returnn = <local> <Tensor name:? tensor:(B(3),F'feature:data'(5)(5),'time:data'[B](11)) returnn_data:'data' [B,F|F'feature:data'(5),T|'time:data'[B]] axes id>
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/tests/test_layers.py", line 1210, in test_merge_batch_with_modified_time.<locals>.model_func
line: y = y.view(-1, fsz) # (B*T',F')
locals:
y = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(7)] axes id>
y.view = <local> <bound method Tensor.view of <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>>
fsz = <local> 'static_dim'(7)(7)
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/tensor.py", line 117, in Tensor.view
line: return reshape(self, shape)
locals:
reshape = <local> <function reshape at 0x7f3bb4e19790>
self = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>
shape = <local> (-1, 'static_dim'(7)(7))
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/functional.py", line 287, in reshape
line: input = modules.Flatten(start_dim=axis1, end_dim=a - 1).as_returnn_torch_functional()(input)
locals:
input = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>
modules = <global> <module 'pytorch_to_returnn.torch.nn.modules' from '/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/__init__.py'>
modules.Flatten = <global> <class 'pytorch_to_returnn.torch.nn.modules.shape.Flatten'>
start_dim = <not found>
axis1 = <local> 0
end_dim = <not found>
a = <local> 2
as_returnn_torch_functional = <not found>
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/torch/nn/modules/module.py", line 439, in Module.__call__
line: res = call_entry.apply_call()
locals:
res = <not found>
call_entry = <local> <CallEntry 'Flatten' <ModuleEntry <Flatten>> (depth 0)>
call_entry.apply_call = <local> <bound method CallEntry.apply_call of <CallEntry 'Flatten' <ModuleEntry <Flatten>> (depth 0)>>
File "/home/runner/work/pytorch-to-returnn/pytorch-to-returnn/pytorch_to_returnn/naming/call.py", line 137, in CallEntry.apply_call
line: layer = returnn_net.construct_layer(net_dict={layer_name: layer_dict}, name=layer_name)
locals:
layer = <not found>
returnn_net = <local> <TFNetwork 'root' train=False>
returnn_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=False>>
net_dict = <not found>
layer_name = <local> 'Flatten', len = 7
layer_dict = <local> {'class': 'merge_dims', 'from': 'Transpose', 'axes': ['B', 'T'], 'keep_order': True}
name = <not found>
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 942, in TFNetwork.construct_layer
line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
locals:
add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=False>>
name = <local> 'Flatten', len = 7
name_with_prefix = <local> 'Flatten', len = 7
layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>]}
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 1089, in TFNetwork.add_layer
line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
locals:
layer = <not found>
self = <local> <TFNetwork 'root' train=False>
self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root' train=False>>
name = <local> 'Flatten', len = 7
layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>]}
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/network.py", line 991, in TFNetwork._create_layer
line: layer_desc["output"] = layer_class.get_out_data_from_opts(**layer_desc)
locals:
layer_desc = <local> {'axes': ['B', 'T'], 'keep_order': True, '_network': <TFNetwork 'root' train=False>, '_name': 'Flatten', 'sources': [<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>], 'name': 'Flatten', 'network': <TFNetwork 'root' train=False>}, len = 7
layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
layer_class.get_out_data_from_opts = <local> <bound method MergeDimsLayer.get_out_data_from_opts of <class 'returnn.tf.layers.basic.MergeDimsLayer'>>
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/layers/basic.py", line 3210, in MergeDimsLayer.get_out_data_from_opts
line: data.batch = data.batch.copy_extend_with_padded_or_fixed_dim_tag(
dim_tag=input_data.get_dim_tag(axis),
batch_major=(axis > input_data.batch_dim_axis) if keep_order else True)
locals:
data = <local> Data{'Flatten_output', [B,F|F'Conv1d:channel'(7)]}
data.batch = <local> BatchInfo{B}
data.batch.copy_extend_with_padded_or_fixed_dim_tag = <local> <bound method BatchInfo.copy_extend_with_padded_or_fixed_dim_tag of BatchInfo{B}>
dim_tag = <not found>
input_data = <local> Data{'Transpose_output', [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}
input_data.get_dim_tag = <local> <bound method Data.get_dim_tag of Data{'Transpose_output', [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)]}>
axis = <local> 1
batch_major = <not found>
input_data.batch_dim_axis = <local> 0
keep_order = <local> True
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 2278, in BatchInfo.copy_extend_with_padded_or_fixed_dim_tag
line: new_dim = self._make_padded_dim(dim_tag)
locals:
new_dim = <not found>
self = <local> BatchInfo{B}
self._make_padded_dim = <local> <bound method BatchInfo._make_padded_dim of BatchInfo{B}>
dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 2182, in BatchInfo._make_padded_dim
line: new_dim = BatchInfo.PaddedDim(dim_tag=dim_tag_base)
locals:
new_dim = <not found>
BatchInfo = <global> <class 'returnn.tf.util.data.BatchInfo'>
BatchInfo.PaddedDim = <global> <class 'returnn.tf.util.data.BatchInfo.PaddedDim'>
dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
dim_tag_base = <local> Dim{'Conv1d:conv:s0'[?]}
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 1860, in BatchInfo.PaddedDim.__init__
line: super(BatchInfo.PaddedDim, self).__init__(size=dim_tag.get_dim_value())
locals:
super = <builtin> <class 'super'>
BatchInfo = <global> <class 'returnn.tf.util.data.BatchInfo'>
BatchInfo.PaddedDim = <global> <class 'returnn.tf.util.data.BatchInfo.PaddedDim'>
self = <local> !AttributeError: 'PaddedDim' object has no attribute 'dim_tag'
__init__ = <not found>
size = <not found>
dim_tag = <local> Dim{'Conv1d:conv:s0'[?]}
dim_tag.get_dim_value = <local> <bound method Dim.get_dim_value of Dim{'Conv1d:conv:s0'[?]}>
File "/home/runner/.local/lib/python3.8/site-packages/returnn/tf/util/data.py", line 954, in Dim.get_dim_value
line: raise Exception('%s: need placeholder, self.dimension or self.dyn_size for dim value' % self)
locals:
Exception = <builtin> <class 'Exception'>
self = <local> Dim{'Conv1d:conv:s0'[?]}
Exception: Dim{'Conv1d:conv:s0'[?]}: need placeholder, self.dimension or self.dyn_size for dim value
See #97
Yea it's helpful that you already started a draft PR with the test case, but this issue description should also contain the description about the actual problem, and the test case is actual most helpful to describe the problem.
line: y = y.view(-1, fsz) # (B*T',F')
locals:
y = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(7)] axes id>
Here the static_dim(5)
already looks wrong. It should not be a static dim. Maybe this the actual problem?
Yea it's helpful that you already started a draft PR with the test case, but this issue description should also contain the description about the actual problem, and the test case is actual most helpful to describe the problem.
Yes, I mean I opened the issue, then opened the PR and wanted to copy the stack trace to show the issue. Your first comment was very fast, so that was not done yet. What else should I write into the description? Open the PR first and directly mention it in the description?
line: y = y.view(-1, fsz) # (B*T',F') locals: y = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(7)] axes id>
Here the
static_dim(5)
already looks wrong. It should not be a static dim. Maybe this the actual problem?
Yes, I noticed this as well, but this is not the actual problem. If the time dim is not modified (as done by the convolution here), this entry is also false but the test passes.
line: input = modules.Flatten(start_dim=axis1, end_dim=a - 1).as_returnn_torch_functional()(input)
locals:
input = <local> <Tensor name:? tensor:(B(3),'static_dim'(5)(5),'static_dim'(7)(7)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(7)] axes id>
This also looks strange that it is 'Conv1d:conv:s0'[?]
now. It should be 'Conv1d:conv:s0'[B]
.
Yea it's helpful that you already started a draft PR with the test case, but this issue description should also contain the description about the actual problem, and the test case is actual most helpful to describe the problem.
Yes, I mean I opened the issue, then opened the PR and wanted to copy the stack trace to show the issue. Your first comment was very fast, so that was not done yet. What else should I write into the description? Open the PR first and directly mention it in the description?
The description itself should ideally contain a demo code (test case) + error (including stack trace).
When you debug-step through it, starting at y = y.view(-1, fsz)
, initially it shows 'Conv1d:conv:s0'[B]
for y
but at some point it becomes 'Conv1d:conv:s0'[?]
. Where is that?
When you debug-step through it, starting at
y = y.view(-1, fsz)
, initially it shows'Conv1d:conv:s0'[B]
fory
but at some point it becomes'Conv1d:conv:s0'[?]
. Where is that?
It's in BatchInfo._make_padded_dim()
in the line dim_tag_base = dim_tag.get_same_base()
. I have to check why dim_tag.same_as
is 'Conv1d:conv:s0'[?]
.
When you debug-step through it, starting at
y = y.view(-1, fsz)
, initially it shows'Conv1d:conv:s0'[B]
fory
but at some point it becomes'Conv1d:conv:s0'[?]
. Where is that?It's in
BatchInfo._make_padded_dim()
in the linedim_tag_base = dim_tag.get_same_base()
.
Can you say more about the stack trace to get there?
I have to check why
dim_tag.same_as
is'Conv1d:conv:s0'[?]
.
Well, this should be ok. This is not a problem. But the Data
instance should call Dim.get_for_batch_ctx
via Data._adapt_batch_consistent_dim_tags
and that should resolve it again.
Edit Sorry, not the Data
instance in this case but BatchInfo._make_padded_dim
or whatever else is using it.
It sounds like this is a RETURNN bug? Can you reproduce a pure RETURNN net dict which has this problem?
It sounds like this is a RETURNN bug? Can you reproduce a pure RETURNN net dict which has this problem?
Yes, I did. See the corresponding issue (https://github.com/rwth-i6/returnn/issues/917) and PR with test case.
Fixed via https://github.com/rwth-i6/returnn/pull/916.
In a case where the time dim is modified (e.g. due to downsampling/padding in a convolution) and we want to merge it with the batch dim, we currently face the problem that the creation of a
PaddedDim
in theMergeDimsLayer
does not work because it is not possible to get the dim value for the modified time dim.A demo test case which reproduces the issue looks like this and can also be found in #97:
The stack trace looks like this (via):