Closed a-sully closed 4 months ago
Yo Austin - if the spec is unclear, then yeah, it should be made so. Before my thoughts, let's first breakdown broadcasting into its three parts:
reshape
of multiple inputs to the rank of the biggest input, filling in the new shape with 1's for the newly inserted dimensions. These are typically inserted on the leading edge, with existing dimensions as right-aligned (e.g. 2D to 4D, [3,4] -> [1,1,3,4]), but more generically you can find rare occurrences of left alignment and even middle axis alignment, such as with instanceNormalization's scale and bias in the decomposition algorithm (e.g. [7] -> [1,7,1,1]]). For elementwise ops though (add, mul, div...), right aligned ranks are the norm.expand
), which is the real work, a repetition of elements across the new dimension sizes.For XLA, step 1 does not happen because it expects the ranks already match. Step 2 uses bidirectional broadcasting for the elementwise operators, and XLA's BroadCastInDim uses undirectional broadcasting of the input shape and expected output shape, even if they don't say it by name.
For NumPy, all 3 steps happen, step 1 is right aligned (though there are likely cases of middle aligned broadcasts too given axes, at least internally for processing), and step 2 is bidirectional, except in the case of its own broadcasting operator broadcast_to which uses unidirectional broadcasting even if it they don't say it by name (e.g. this works numpy.broadcast_to([1, 2, 3], (3, 3))
, while this fails numpy.broadcast_to([1, 2, 3], (3, 1))
because the input shape is not undirectionally broadcastable to the output shape) .
And my thinking so far:
mul(a3RowVector, b3ColVector)
rather than mul(expand(reshape(a3RowVector, [3,1]), [3,3]), expand(reshape(b3ColVector, [1,3]), [3,3]))
).add
with different ranks would need to be trivially reshape
d to the same rank first by the client). Note at least a few backends already require consistent ranks on inputs, including at least a few XNNPack operators and nearly all DirectML ops; and if Apple BNNS and MPS also have this consistent-same-rank requirement, then it could ease backend adoption (plus reduce testing complexity and spec'ese) if the caller took care of the right-alignment/left-alignment/middle-alignment rank coercion before calling WebNN. All backends should treat reshapes as light adjustments of the tensor description without copies of the actual tensor data, making reshapes basically free. Though, here's another case of balancing backend complexity vs front-end caller complexity ⚖. I'd like more info on BNNS and MPS behavior first (maybe Phillis knows?).expand
, GEMM
's C tensor, and prelu
:
expand
, it's functionally very similar to XLA's BroadCastInDim ("...expanding existing dimensions with size 1...The dimensions of operand must have size 1 or be the same size as the dimension in the output shape they are mapped to") except that BroadcastInDim combines both a reshape
and an expand
into a single operator (BroadCastInDim
-> expand(reshape(input, coercedRankShape), expandedShape)
). Btw, we used to have more reshape-family operators proposed for WebNN (squeeze
, unsqueeze
, flattenTo2D
), until realizing that (a) they were all just little variations of reshape
which the client can resolve as higher layer policy (b) there may be other reshaping variants we don't even know about, and (c) increasing the API surface here didn't actually bring any hardware benefit, because WebNN's real benefit is about the accelerated backends (the "real" operators, moreso than just fiddling with dimensions in a tensor description).So, I'm partial to an option 4:
@huningxin?
Thank you @fdwr for the very thorough response! I think your Option 4 proposal makes sense, with one addendum
My primary motivations for filing this issue were:
It seems that I've been successful in that latter point :)
here's another case of balancing backend complexity vs front-end caller complexity ⚖
In the spirit of https://github.com/extensibleweb/manifesto I'm generally in favor of pushing complexity to callers (e.g. "This leads to better performance with less implementation effort"). In this case, I didn't expect that we'd actually adopt XLA's broadcasting rules for WebNN, though I figured it was worth calling it out as the option on the furthest towards the "caller complexity" end of things :P
As for the follow-up question... Regarding:
any WebNN operators that use broadcasting should be clear that they do, rather than something that implicitly happens for any operator
I agree! Related to that:
you can find rare occurrences of left alignment and even middle axis alignment, such as with instanceNormalization's scale and bias in the decomposition algorithm (e.g. [7] -> [1,7,1,1]]) and then:
Is this middle axis alignment perhaps only relevant when using NCHW layout? If we were using NHWC layout, would [7] broadcast to [1, 1, 1, 7]?
Regardless, the spec of instanceNormalization doesn't say anything about broadcasting. Let's add a fourth action item to Option 4?
Reviving a discussion from #534, which defined shape broadcasting but didn't touch on the question of what WebNN's shape broadcasting rules should be
WebNN currently specifies two kinds of broadcasting rules: unidirectional and bidirectional
Of the popular ML frameworks, ONNX (which WebNN is largely based on) appears to be an outlier in making a distinction between "unidirectional" and "multidirectional" broadcasting. This distinction is not made by:
The "unidirectional broadcastable" constraint of some ONNX ops (e.g.
prelu()
) requires workarounds when exporting from other formats to ONNX - like in this example of using TVM to export PyTorch to ONNX: https://github.com/pytorch/pytorch/issues/70570#issuecomment-1034379620.What should we do?
Option 1: Adopt Numpy's broadcasting rules
Rationale: Numpy's broadcasting rules are a standard across the industry. It seems reasonable to be what we expose to the web
Outcome: "bidirectional broadcasting" will be the only type of broadcasting exposed to the web. The user agent muse ensure that the constraints of the underlying framework - such as unidirectional broadcasting for ONNX (@fdwr has suggested that this is trivial), and lack of inferred broadcasting specifications for XLA (more on that below) - are satisfied.
Option 2: Adopt XLA's broadcasting rules
Rationale: The XLA Principles apply to WebNN, too:
Outcome: Both "unidirectional broadcasting" and "bidirectional broadcasting" concepts would be removed from the WebNN spec. To facilitate explicit broadcasts, something like StableHLO's
broadcast_in_dim
op would need to be added to WebNNOption 3: Keep the status quo
Rationale: It's the status quo
Outcome: No action needed regarding the current spec. However, all models ported to WebNN will need to abide by this "unidirectionally broadcastable" constraint which is specific to ONNX