`rf.RelPosCausalSelfAttention` fails with `single_step_dim`

LucaG1 commented 1 month ago

Hi, I'm having a problem with rf.RelPosCausalSelfAttention when using it in a transformer decoder. It fails because it wants to remove single_step_dim from a tensor that does not have it in the function _rel_pos_enc_shift here: https://github.com/rwth-i6/returnn/blob/23d666ccf3ac9e748fce4e0d27afe353133eca48/returnn/frontend/attention.py#L412

https://github.com/rwth-i6/returnn/blob/23d666ccf3ac9e748fce4e0d27afe353133eca48/returnn/frontend/attention.py#L533

The input: matrix_bd looks like this: Tensor{'dot', ['initial-beam'(1),B?,'num_heads'(8),'self_att_expand_dim_init+1'(1)]}

The error i get looks like this.

    line: matrix_bd = _rel_pos_enc_shift(matrix_bd, axis, pos_emb_spatial_dim, hist_dim)
    locals:
      matrix_bd = <local> Tensor{'dot', ['initial-beam'(1),B?,'num_heads'(8),'self_att_expand_dim_init+1'(1)]}
      _rel_pos_enc_shift = <global> <function _rel_pos_enc_shift at 0x7f78c8937ac0>
      axis = <local> Dim{'single-step'!}
      pos_emb_spatial_dim = <local> Dim{'self_att_expand_dim_init+1'(1)}
      hist_dim = <local> Dim{'self_att_expand_dim_init+1'(1)}
  File "returnn/returnn/frontend/attention.py", line 412, in _rel_pos_enc_shift
    line: batch_dims = x.remaining_dims((axis, pos_emb_spatial_dim))
    locals:
      batch_dims = <not found>
      x = <local> Tensor{'dot', ['initial-beam'(1),B?,'num_heads'(8),'self_att_expand_dim_init+1'(1)]}
      x.remaining_dims = <local> <bound method _TensorMixin.remaining_dims of Tensor{'dot', ['initial-beam'(1),B?,'num_heads'(8),'self_att_expand_dim_init+1'(1)]}>
      axis = <local> Dim{'single-step'!}
      pos_emb_spatial_dim = <local> Dim{'self_att_expand_dim_init+1'(1)}
  File "returnn/returnn/tensor/_tensor_extra.py", line 1849, in _TensorMixin.remaining_dims
    line: batch_dims.remove(remove_)
    locals:
      batch_dims = <local> [Dim{'initial-beam'(1)}, Dim{B}, Dim{'num_heads'(8)}, Dim{'self_att_expand_dim_init+1'(1)}]
      batch_dims.remove = <local> <built-in method remove of list object at 0x7f7811ab6900>
      remove_ = <local> Dim{'single-step'!}
ValueError: list.remove(x): x not in list

I don't have an easy setup yet for you to reproduce this. However I think it should be easily reproducible when using rf.RelPosCausalSelfAttention with single_step_dim.

I also need to look deeper into the functionality behind this in order to understand what the correct behaviour would be.

If I have any new information on this I will post it here.