Closed albertz closed 2 years ago
An alternative to this proposal is to simply use out_dims
, as it was proposed in #597 (here). By simply specifying the output dims / shape, we can infer all information.
Mesh Tensorflow mtf.einsum(inputs, output_shape)
also follows exactly this logic.
For rec optimization, this is also easy to handle (as we already do now, via #628).
Although we do not have to do either-or, we can also do both. It might be helpful for readability to have reduce
explicit.
Related to #629 and #627, it can be redundant to specify both
red1
andred2
as arguments. These axes really need to be the same, and also share the same dyn lengths.Via unique dim tags (#632), we can use a single
reduce
argument.Originally posted by @Zettelkasten in https://github.com/rwth-i6/returnn/issues/629#issuecomment-913050945