Closed vieting closed 2 years ago
I don't see from this description why this is a problem?
Yes sorry, that was not clear. The tensor is unsqueezed to (B, 1, T)
, however, the RETURNN data of the output is (B, T, F)
, so F
and T
are assigned to the wrong axes. I'll update the description.
But why is this axis assignment a problem?
I have a ConvLayer
afterwards and now we explicitly specify in_spatial_dims
since #93. In the case here, I get 'in_spatial_dims': ['F']
and AssertionError: invalid in_spatial_dims [Dim{'time:data'[B]}]
, because feature dim axes are not allowd in in_spatial_dims
.
The SplitDimsLayer
is called with {'class': 'split_dims', 'from': 'data', 'axis': 'T', 'dims': [1, -1]}
. I think it's wrong/unintended, that in SplitDimsLayer.get_out_data_from_opts
shortly before the end, out.time_dim_axis
is set to 1 instead of 2. This corresponds to the splitted dim 1
instead of -1
which would be the original time dim axis.
The
SplitDimsLayer
is called with{'class': 'split_dims', 'from': 'data', 'axis': 'T', 'dims': [1, -1]}
. I think it's wrong/unintended, that inSplitDimsLayer.get_out_data_from_opts
shortly before the end,out.time_dim_axis
is set to 1 instead of 2. This corresponds to the splitted dim1
instead of-1
which would be the original time dim axis.
What is data
in this case?
But even if this behavior is maybe unexpected (and maybe should be fixed), I still don't understand why it is a problem.
because feature dim axes are not allowd in
in_spatial_dims
.
Why do you think so? No, this should not be the case.
I changed the behavior now to make it more consistent (https://github.com/rwth-i6/returnn/pull/914). However, I still think that it should also work without this change, and whatever problem you encountered is sth else.
Thanks for the fix in https://github.com/rwth-i6/returnn/pull/914! This indeed fixes my problem as well. I updated #95 to better reflect the issue I was facing, but with the current RETURNN master, all test pass now.
I have a case with a
(B, T)
tensor which is unsqueezed to(B, 1, T)
(adding feature dim for raw waveform), however, the RETURNN data of the output is(B, T, F)
, soF
andT
are assigned to the wrong axes. It seems that the behavior changed and the RETURNN axes are not handled correctly now (or this just did not become obvious before).