Closed vieting closed 3 years ago
It should match the PyTorch behavior, obviously.
I assume this is the same as the TF behavior? I.e. if you have x + y
, and one of them has less dimensions than the other, it would add dummy dimensions in front. E.g. [T,B,F] + [F] would expand the [F] to [1,1,F]. And then the normal broadcasting rules apply.
Yes exactly. I added this expansion now in _unify_tensor_axes_returnn_meta
Hi @albertz, in the context of #28 I figured that adding a bias which has only feature dim using broadcasting is not yet supported, see test case. What do you think how this should be done?