Open vieting opened 1 year ago
You can explicitly specify in the ConvLayer what spatial axes it should use for the convolution, and what axis it should use for the channel. That way, you can do anything you need.
Right, setting in_dim
actually solves this. However, this is problematic for _get_output_shape_from_returnn
now because there, the new feature dim is mapped to the old feature dim and as a result, the remaining dims are also mapped incorrectly. Do you have a suggestion to solve this without too much overhead?
I have a case where the convolution is done over a dim, that RETURNN considers as feature dim. However, in pytorch a new dim is created before the convolution and this is now supposed to be the feature dim. Of course, RETURNN cannot directly know this. But I think in the case that a convolution is done over the feature dim and another static dim exists, we can argue that we should consider that other dim as feature dim and do the convolution. What do you think, @albertz?
A test to show the error:
Error message: