Unsqueeze followed by residual connection problem

LabChameleon commented 2 years ago

When trying to convert the fairseq wav2vec2 conformer implementation I run into an assertion error:

line: assert id(base) not in visited # should not have cycles. normally this should never be triggered locals: id = <builtin> <built-in function id> base = <local> Dim{'Conv1d_1:conv:s0'[B]} visited = <local> {139958005967408: Dim{'Conv1d_1:conv:s0'[B]}, 139958005966784: Dim{'Conv1d_1:conv:s0'[B]}, 139958005899520: Dim{'15+Conv1d:conv:s0+15'[B]}, 139958005901104: Dim{'15+Conv1d:conv:s0+15'[?]}, 139958016194832: Dim{'Conv1d_1:conv:s0'[B]}} AssertionError

A small example reproducing the problem can be found in #123. The problem seems to be caused by the unsqueeze operation in the beginning. However I am not sure what exactly is going wrong here.

albertz commented 2 years ago

(Note that I will not have time to look into this in the next time. So it would be good if you could debug and fix that yourself.)

Can you post a bit more about the error? Some parts of the stack trace to at least see where this error occurs.

Is this a RETURNN issue or pytorch-to-returnn issue? Can you create a RETURNN config with the same problem? If so, then you should also an issue on RETURNN side.

LabChameleon commented 2 years ago

(Note that I will not have time to look into this in the next time. So it would be good if you could debug and fix that yourself.)

Ok, I understand. If you have any hint what might go wrong and where to look that would be super helpful of course.

Here is the output of the converter:

Executing: test_unsqueeze_residual
Running with standard reference imports...

Running with wrapped imports, wrapping original PyTorch...
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.from_numpy(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch._jit_internal._copy_to_script_wrapper(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch._jit_internal._overload_method(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.init.kaiming_uniform_(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.init._calculate_fan_in_and_fan_out(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.init.uniform_(...)
*** torch module call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.modules.conv.Conv1d(...)(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.functional.conv1d(...)
*** torch module call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.modules.padding.ConstantPad1d(...)(...)
*** func call pytorch_to_returnn.import_wrapper._torch_traced.torch.nn.functional.pad(...)
Module naming hierarchy:
.tmp_root: (hidden, empty)
Conv1d: <ModuleEntry Conv1d(1, 768, kernel_size=(10,), stride=(5,))> -> ...
ConstantPad1d: <ModuleEntry ConstantPad1d(padding=(15, 15), value=0)> -> ...
Conv1d_1: <ModuleEntry Conv1d(768, 768, kernel_size=(31,), stride=(1,), groups=768)> -> ...
Root module calls:
{
  'Conv1d': <CallEntry 'Conv1d' <ModuleEntry Conv1d(1, 768, kernel_size=(10,), stride=(5,))> (depth 0)>,
  'ConstantPad1d': <CallEntry 'ConstantPad1d' <ModuleEntry ConstantPad1d(padding=(15, 15), value=0)> (depth 0)>,
  'Conv1d_1': <CallEntry 'Conv1d_1' <ModuleEntry Conv1d(768, 768, kernel_size=(31,), stride=(1,), groups=768)> (depth 0)>
}
Modules with params:
{
  'Conv1d': Conv1d(1, 768, kernel_size=(10,), stride=(5,)),
  'Conv1d_1': Conv1d(768, 768, kernel_size=(31,), stride=(1,), groups=768)
}
Looks good!

Running with wrapped Torch import, wrapping replacement for PyTorch...
RETURNN input: Data{'data', [B,T|'time:data'[B]]}
*** root/'Unflatten_Length' layer dict: {'class': 'length', 'axis': 'T', 'from': 'data'}
layer root/'data': [B,T|'time:data'[B]] float32
layer root/'Unflatten_Length': [B] int32
*** root/'Unflatten_Length' LengthLayer output: <TensorEntry name:? tensor:(B(3),) returnn_data:'time:data:dyn_size' [B] axes id>
*** root/'Unflatten_Reduce' layer dict: {'class': 'reduce', 'mode': 'max', 'axes': ['B'], 'from': 'Unflatten_Length'}
layer root/'Unflatten_Reduce': [] int32
*** root/'Unflatten_Reduce' ReduceLayer output: <TensorEntry name:? tensor:() returnn_data:'Unflatten_Reduce_output' [] axes id>
*** root/'Unflatten' layer dict: {'class': 'split_dims', 'from': 'data', 'axis': 'T', 'dims': [1, -1]}
layer root/'Unflatten': [B,F|'Unflatten_split_dims0'(1),T|'time:data'[B]] float32
*** root/'Unflatten' SplitDimsLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(1)(1),'time:data'[B](1000)) returnn_data:'Unflatten_output' [B,F|'Unflatten_split_dims0'(1),T|'time:data'[B]] axes id>
*** root/'Conv1d' layer dict: {'class': 'conv', 'from': 'Unflatten', 'activation': None, 'with_bias': True, 'n_out': 768, 'filter_size': (10,), 'padding': 'valid', 'in_spatial_dims': ['T'], 'strides': (5,)}
layer root/'Conv1d': [B,T|'Conv1d:conv:s0'[?],F|F'Conv1d:channel'(768)] float32
*** root/'Conv1d' ConvLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Conv1d_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'Conv1d' ConvLayer importing params ['weight', 'bias'] ...
*** root/'Conv1d' ConvLayer check RETURNN inputs/outputs given Torch inputs/outputs ...
**** validate: add network input tensor <TensorEntry name:? tensor:(B(3),'time:data'[B](1000)) returnn_data:'data' [B,T|'time:data'[B]] axes id>
**** validate: add call <CallEntry 'Conv1d' <ModuleEntry <Conv1d>> (depth 0)> input tensor <TensorEntry name:? tensor:(B(3),'static_dim'(1)(1),'time:data'[B](1000)) returnn_data:'Unflatten_output' [B,F|'Unflatten_split_dims0'(1),T|'time:data'[B]] axes id>
**** validate: add call <CallEntry 'Conv1d' <ModuleEntry <Conv1d>> (depth 0)> output tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Conv1d_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'Transpose' layer dict: {'class': 'copy', 'from': 'Conv1d'}
layer root/'Transpose': [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] float32
*** root/'Transpose' CopyLayer output: <TensorEntry name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes id>
*** root/'Transpose_1' layer dict: {'class': 'copy', 'from': 'Transpose'}
layer root/'Transpose_1': [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] float32
*** root/'Transpose_1' CopyLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Transpose_1_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'ConstantPad1d' layer dict: {'class': 'pad', 'mode': 'constant', 'axes': ['T'], 'padding': [(15, 15)], 'from': 'Transpose_1', 'value': 0}
layer root/'ConstantPad1d': [B,T|'15+Conv1d:conv:s0+15'[?],F|F'Conv1d:channel'(768)] float32
*** root/'ConstantPad1d' PadLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+31'[?](229)) returnn_data:'ConstantPad1d_output' [B,T|'15+Conv1d:conv:s0+15'[B],F|F'Conv1d:channel'(768)] axes {0:0,1:2,2:1}>
*** root/'ConstantPad1d' PadLayer check RETURNN inputs/outputs given Torch inputs/outputs ...
**** validate: add call <CallEntry 'ConstantPad1d' <ModuleEntry <ConstantPad1d>> (depth 0)> input tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Transpose_1_output' [B,T|'Conv1d:conv:s0'[B],F|F'Conv1d:channel'(768)] axes {0:0,2:1,1:2}>
**** validate: add call <CallEntry 'ConstantPad1d' <ModuleEntry <ConstantPad1d>> (depth 0)> output tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+31'[?](229)) returnn_data:'ConstantPad1d_output' [B,T|'15+Conv1d:conv:s0+15'[B],F|F'Conv1d:channel'(768)] axes {0:0,1:2,2:1}>
*** root/'Conv1d_1' layer dict: {'class': 'conv', 'from': 'ConstantPad1d', 'activation': None, 'with_bias': True, 'n_out': 768, 'filter_size': (31,), 'padding': 'valid', 'in_spatial_dims': ['T'], 'groups': 768}
layer root/'Conv1d_1': [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] float32
*** root/'Conv1d_1' ConvLayer output: <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Conv1d_1_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'Conv1d_1' ConvLayer importing params ['weight', 'bias'] ...
*** root/'Conv1d_1' ConvLayer check RETURNN inputs/outputs given Torch inputs/outputs ...
**** validate: add call <CallEntry 'Conv1d_1' <ModuleEntry <Conv1d>> (depth 0)> input tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+31'[?](229)) returnn_data:'ConstantPad1d_output' [B,T|'15+Conv1d:conv:s0+15'[B],F|F'Conv1d:channel'(768)] axes {0:0,1:2,2:1}>
**** validate: add call <CallEntry 'Conv1d_1' <ModuleEntry <Conv1d>> (depth 0)> output tensor <TensorEntry name:? tensor:(B(3),'static_dim'(768)(768),'((time:data+-10)//5)+1'[?](199)) returnn_data:'Conv1d_1_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes {0:0,2:1,1:2}>
*** root/'Transpose_2' layer dict: {'class': 'copy', 'from': 'Conv1d_1'}
layer root/'Transpose_2': [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] float32
*** root/'Transpose_2' CopyLayer output: <TensorEntry name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_2_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes id>
*** root/'add' layer dict: {'class': 'combine', 'kind': 'add', 'from': ['Transpose_2', 'Transpose']}
Exception creating layer root/'add' of class CombineLayer with opts:
{'_name': 'add',
 '_network': <TFNetwork 'root' train=False>,
 'kind': 'add',
 'name': 'add',
 'network': <TFNetwork 'root' train=False>,
 'sources': [<CopyLayer 'Transpose_2' out_type=Data{[B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)]}>,
             <CopyLayer 'Transpose' out_type=Data{[B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d:channel'(768)]}>]}

I think the important parts of the stack-trace of the error are the following:

File "test_layers.py", line 1202, in test_unsqueeze_residual.<locals>.model_func
    line: x = x + residual
    locals:
      x = <local> <Tensor name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_2_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes id>
      residual = <local> <Tensor name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d:channel'(768)] axes id>
  File "/u/dierkes/forks/pytorch-to-returnn/pytorch_to_returnn/torch/tensor.py", line 302, in Tensor.__add__
    line: return add(self, other)
    locals:
      add = <local> <function add at 0x7fc0e028d280>
      self = <local> <Tensor name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_2_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d_1:channel'(768)] axes id>
      other = <local> <Tensor name:? tensor:(B(3),'((time:data+-10)//5)+1'[?](199),'static_dim'(768)(768)) returnn_data:'Transpose_output' [B,T|'Conv1d_1:conv:s0'[B],F|F'Conv1d:channel'(768)] axes id>

and then:

  File "/u/dierkes/forks/pytorch-to-returnn/tests/returnn/returnn/tf/util/data.py", line 1018, in Dim.get_all_dimension_tags
    line: replace_existing = existing_tag.undefined and not tag.undefined and tag.dimension == existing_tag.dimension
    locals:
      replace_existing = <local> False
      existing_tag = <local> Dim{'Conv1d_1:conv:s0'[B]}
      existing_tag.undefined = <local> !AssertionError: 
      tag = <local> Dim{'Conv1d_1:conv:s0'[B]}
      tag.undefined = <local> !AssertionError: 
      tag.dimension = <local> None
      existing_tag.dimension = <local> None
  File "/u/dierkes/forks/pytorch-to-returnn/tests/returnn/returnn/tf/util/data.py", line 870, in Dim.undefined
    line: assert id(base) not in visited  # should not have cycles. normally this should never be triggered
    locals:
      id = <builtin> <built-in function id>
      base = <local> Dim{'Conv1d_1:conv:s0'[B]}
      visited = <local> {140464343272080: Dim{'Conv1d_1:conv:s0'[B]}, 140464343271360: Dim{'Conv1d_1:conv:s0'[B]}, 140464343200048: Dim{'15+Conv1d:conv:s0+15'[B]}, 140464343201632: Dim{'15+Conv1d:conv:s0+15'[?]}, 140464344299840: Dim{'Conv1d_1:conv:s0'[B]}}
AssertionError

The full stack-trace can be found in my PR #123. So the problem seems to occur when "closing" the residual connection (line 1202 in test_layers.py). However the reason the problem is introduced is because of the unsqueeze. If I input the data directly in the correct dimension and do not use unsqueeze it works fine.

I did not checked yet if I can reproduce the problem using only returnn without the converter. I will try to do that.

rwth-i6 / pytorch-to-returnn

Unsqueeze followed by residual connection problem #124