Closed LabChameleon closed 2 years ago
(Note that I will not have time to look into this in the next time. So it would be good if you could debug and fix that yourself.)
Can you post a bit more about the error? Some parts of the stack trace to at least see where this error occurs.
Is this a RETURNN issue or pytorch-to-returnn issue? Can you create a RETURNN config with the same problem? If so, then you should also an issue on RETURNN side.
(Note that I will not have time to look into this in the next time. So it would be good if you could debug and fix that yourself.)
Ok, I understand. If you have any hint what might go wrong and where to look that would be super helpful of course.
Here is the output of the converter:
Executing: test_unsqueeze_residual
Running with standard reference imports...
Running with wrapped imports, wrapping original PyTorch...
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.from_numpy(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch._jit_internal._copy_to_script_wrapper(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch._jit_internal._overload_method(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.init.kaiming_uniform_(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.init._calculate_fan_in_and_fan_out(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.init.uniform_(...)
*** torch module call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.modules.conv.Conv1d(...)(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.functional.conv1d(...)
*** torch module call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.modules.padding.ConstantPad1d(...)(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.functional.pad(...)
Module naming hierarchy:
.tmp_root: (hidden, empty)
Conv1d: <ModuleEntry Conv1d(1, 768, kernel_size=(10,), stride=(5,))> -> ...
ConstantPad1d: <ModuleEntry ConstantPad1d(padding=(15, 15), value=0)> -> ...
Conv1d_1: <ModuleEntry Conv1d(768, 768, kernel_size=(31,), stride=(1,), groups=768)> -> ...
Root module calls:
{
'Conv1d': <CallEntry 'Conv1d' <ModuleEntry Conv1d(1, 768, kernel_size=(10,), stride=(5,))> (depth 0)>,
'ConstantPad1d': <CallEntry 'ConstantPad1d' <ModuleEntry ConstantPad1d(padding=(15, 15), value=0)> (depth 0)>,
'Conv1d_1': <CallEntry 'Conv1d_1' <ModuleEntry Conv1d(768, 768, kernel_size=(31,), stride=(1,), groups=768)> (depth 0)>
}
Modules with params:
{
'Conv1d': Conv1d(1, 768, kernel_size=(10,), stride=(5,)),
'Conv1d_1': Conv1d(768, 768, kernel_size=(31,), stride=(1,), groups=768)
}
Looks good!
Running with wrapped Torch import, wrapping replacement for PyTorch...
RETURNN input: Data{'data', [B,T|'time:data'[B]]}
*** root/'Unflatten_Length' layer dict: {'class': 'length', 'axis': 'T', 'from': 'data'}
layer root/'data': [B,T|'time:data'[B]] float32
layer root/'Unflatten_Length': [B] int32
*** root/'Unflatten_Length' LengthLayer output: <TensorEntry name:? tensor:(B(3),) returnn_data:'time:data:dyn_size' [B] axes id>
*** root/'Unflatten_Reduce' layer dict: {'class': 'reduce', 'mode': 'max', 'axes': ['B'], 'from': 'Unflatten_Length'}
layer root/'Unflatten_Reduce': [] int32
*** root/'Unflatten_Reduce' ReduceLayer output: <TensorEntry name:? tensor:() returnn_data:'Unflatten_Reduce_output' [] axes id>
*** root/'Unflatten' layer dict: {'class': 'split_dims', 'from': 'data', 'axis': 'T', 'dims': [1, -1]}
layer root/'Unflatten': [B,F|'Unflatten_split_dims0'(1),T|'time:data'[B]] float32
*** root/'Unflatten' SplitDimsLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(1)(1),'time:data'[B](1000)) returnn_data:'Unflatten_output' [B,F|'Unflatten_split_dims0'(1),T|'time:data'[B]] axes id>
*** root/'Conv1d' layer dict: {'class': 'conv', 'from': 'Unflatten', 'activation': None, 'with_bias': True, 'n_out': 768, 'filter_size': (10,), 'padding': 'valid', 'in_spatial_dims': ['T'], 'strides': (5,)}
layer root/'Conv1d': [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(768)] float32
*** root/'Conv1d' ConvLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Conv1d_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'Conv1d' ConvLayer importing params ['weight', 'bias'] ...
*** root/'Conv1d' ConvLayer check RETURNN inputs/outputs given Torch inputs/outputs ...
**** validate: add network input tensor <TensorEntry name:? tensor:(B(3),'time:data'[B](1000)) returnn_data:'data' [B,T|'time:data'[B]] axes id>
**** validate: add call <CallEntry 'Conv1d' <ModuleEntry <Conv1d>> (depth 0)> input tensor <TensorEntry name:? tensor:(B(3),'static_dim'(1)(1),'time:data'[B](1000)) returnn_data:'Unflatten_output' [B,F|'Unflatten_split_dims0'(1),T|'time:data'[B]] axes id>
**** validate: add call <CallEntry 'Conv1d' <ModuleEntry <Conv1d>> (depth 0)> output tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Conv1d_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'Transpose' layer dict: {'class': 'copy', 'from': 'Conv1d'}
layer root/'Transpose': [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] float32
*** root/'Transpose' CopyLayer output: <TensorEntry name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes id>
*** root/'Transpose_1' layer dict: {'class': 'copy', 'from': 'Transpose'}
layer root/'Transpose_1': [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] float32
*** root/'Transpose_1' CopyLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Transpose_1_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'ConstantPad1d' layer dict: {'class': 'pad', 'mode': 'constant', 'axes': ['T'], 'padding': [(15, 15)], 'from': 'Transpose_1', 'value': 0}
layer root/'ConstantPad1d': [B,T|'15+Conv1d:conv:s0+15'[?],F|F'Conv1d:channel'(768)] float32
*** root/'ConstantPad1d' PadLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+31'[?](229)) returnn_data:'ConstantPad1d_output' [B,T|'15+Conv1d:conv:s0+15'[B],F|F'Conv1d:channel'(768)] axes {0:0,1:2,2:1}>
*** root/'ConstantPad1d' PadLayer check RETURNN inputs/outputs given Torch inputs/outputs ...
**** validate: add call <CallEntry 'ConstantPad1d' <ModuleEntry <ConstantPad1d>> (depth 0)> input tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Transpose_1_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes {0:0,2:1,1:2}>
**** validate: add call <CallEntry 'ConstantPad1d' <ModuleEntry <ConstantPad1d>> (depth 0)> output tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+31'[?](229)) returnn_data:'ConstantPad1d_output' [B,T|'15+Conv1d:conv:s0+15'[B],F|F'Conv1d:channel'(768)] axes {0:0,1:2,2:1}>
*** root/'Conv1d_1' layer dict: {'class': 'conv', 'from': 'ConstantPad1d', 'activation': None, 'with_bias': True, 'n_out': 768, 'filter_size': (31,), 'padding': 'valid', 'in_spatial_dims': ['T'], 'groups': 768}
layer root/'Conv1d_1': [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] float32
*** root/'Conv1d_1' ConvLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Conv1d_1_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'Conv1d_1' ConvLayer importing params ['weight', 'bias'] ...
*** root/'Conv1d_1' ConvLayer check RETURNN inputs/outputs given Torch inputs/outputs ...
**** validate: add call <CallEntry 'Conv1d_1' <ModuleEntry <Conv1d>> (depth 0)> input tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+31'[?](229)) returnn_data:'ConstantPad1d_output' [B,T|'15+Conv1d:conv:s0+15'[B],F|F'Conv1d:channel'(768)] axes {0:0,1:2,2:1}>
**** validate: add call <CallEntry 'Conv1d_1' <ModuleEntry <Conv1d>> (depth 0)> output tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Conv1d_1_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'Transpose_2' layer dict: {'class': 'copy', 'from': 'Conv1d_1'}
layer root/'Transpose_2': [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] float32
*** root/'Transpose_2' CopyLayer output: <TensorEntry name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_2_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes id>
*** root/'add' layer dict: {'class': 'combine', 'kind': 'add', 'from': ['Transpose_2', 'Transpose']}
Exception creating layer root/'add' of class CombineLayer with opts:
{'_name': 'add',
'_network': <TFNetwork 'root' train=False>,
'kind': 'add',
'name': 'add',
'network': <TFNetwork 'root' train=False>,
'sources': [<CopyLayer 'Transpose_2' out_type=Data{[B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)]}>,
<CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d:channel'(768)]}>]}
I think the important parts of the stack-trace of the error are the following:
File "test_layers.py", line 1202, in test_unsqueeze_residual.<locals>.model_func
line: x = x + residual
locals:
x = <local> <Tensor name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_2_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes id>
residual = <local> <Tensor name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d:channel'(768)] axes id>
File "/u/dierkes/forks/pytorch-to-returnn/pytorch_to_returnn/torch/tensor.py", line 302, in Tensor.__add__
line: return add(self, other)
locals:
add = <local> <function add at 0x7fc0e028d280>
self = <local> <Tensor name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_2_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes id>
other = <local> <Tensor name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d:channel'(768)] axes id>
and then:
File "/u/dierkes/forks/pytorch-to-returnn/tests/returnn/returnn/tf/util/data.py", line 1018, in Dim.get_all_dimension_tags
line: replace_existing = existing_tag.undefined and not tag.undefined and tag.dimension == existing_tag.dimension
locals:
replace_existing = <local> False
existing_tag = <local> Dim{'Conv1d_1:conv:s0'[B]}
existing_tag.undefined = <local> !AssertionError:
tag = <local> Dim{'Conv1d_1:conv:s0'[B]}
tag.undefined = <local> !AssertionError:
tag.dimension = <local> None
existing_tag.dimension = <local> None
File "/u/dierkes/forks/pytorch-to-returnn/tests/returnn/returnn/tf/util/data.py", line 870, in Dim.undefined
line: assert id(base) not in visited # should not have cycles. normally this should never be triggered
locals:
id = <builtin> <built-in function id>
base = <local> Dim{'Conv1d_1:conv:s0'[B]}
visited = <local> {140464343272080: Dim{'Conv1d_1:conv:s0'[B]}, 140464343271360: Dim{'Conv1d_1:conv:s0'[B]}, 140464343200048: Dim{'15+Conv1d:conv:s0+15'[B]}, 140464343201632: Dim{'15+Conv1d:conv:s0+15'[?]}, 140464344299840: Dim{'Conv1d_1:conv:s0'[B]}}
AssertionError
The full stack-trace can be found in my PR #123. So the problem seems to occur when "closing" the residual connection (line 1202 in test_layers.py
). However the reason the problem is introduced is because of the unsqueeze
. If I input the data directly in the correct dimension and do not use unsqueeze
it works fine.
I did not checked yet if I can reproduce the problem using only returnn without the converter. I will try to do that.
When trying to convert the fairseq wav2vec2 conformer implementation I run into an assertion error:
line: assert id(base) not in visited # should not have cycles. normally this should never be triggered locals: id = <builtin> <built-in function id> base = <local> Dim{'Conv1d_1:conv:s0'[B]} visited = <local> {139958005967408: Dim{'Conv1d_1:conv:s0'[B]}, 139958005966784: Dim{'Conv1d_1:conv:s0'[B]}, 139958005899520: Dim{'15+Conv1d:conv:s0+15'[B]}, 139958005901104: Dim{'15+Conv1d:conv:s0+15'[?]}, 139958016194832: Dim{'Conv1d_1:conv:s0'[B]}} AssertionError
A small example reproducing the problem can be found in #123. The problem seems to be caused by the unsqueeze operation in the beginning. However I am not sure what exactly is going wrong here.