rwth-i6 / returnn

The RWTH extensible training framework for universal recurrent neural networks
http://returnn.readthedocs.io/
Other
348 stars 130 forks source link

`DotLayer`, use single `reduce` argument #636

Closed albertz closed 2 years ago

albertz commented 3 years ago

Related to #629 and #627, it can be redundant to specify both red1 and red2 as arguments. These axes really need to be the same, and also share the same dyn lengths.

Via unique dim tags (#632), we can use a single reduce argument.

Originally posted by @Zettelkasten in https://github.com/rwth-i6/returnn/issues/629#issuecomment-913050945

albertz commented 3 years ago

An alternative to this proposal is to simply use out_dims, as it was proposed in #597 (here). By simply specifying the output dims / shape, we can infer all information.

Mesh Tensorflow mtf.einsum(inputs, output_shape) also follows exactly this logic.

For rec optimization, this is also easy to handle (as we already do now, via #628).

Although we do not have to do either-or, we can also do both. It might be helpful for readability to have reduce explicit.

albertz commented 2 years ago

@Zettelkasten To clarify, you also propose to make var1/var2 optional, as you stated here? Although this is kind of orthogonal to using a single reduce argument. Edit Separate issue on this: #738