Closed a-sully closed 3 weeks ago
🤔 Reviewing so many code reviews the past year, these fields...
MLConv2dOptions::activation
MLConvTranspose2dOptions::activation
MLBatchNormalizationOptions::activation
...have indeed shown to be problematic in the following aspects:
MLActivations
are only valid with some operators, like clamp
which was added as a stand-alone MLActivation
useable with convolution
but invalid to use with lstm
/lstmCell
/gru
/gruCell
, which leads to user surprises and documentation that's not straight-forward. Conversely, some other operators are actually fusible in the backend platform (like GEMM+softmax in DML) but are not expressible at the WebNN layer. Mingming can attest to the bugs and interesting considerations that caused so far. Trying to come up with a complete "activation set" to satisfy this across all the platforms is untenable, given the divergences. There's also semantic awkwardness to just what an "activation" function really is as operators don't have clear buckets that can all fit into - e.g. softmax
is a normalization operator, and clamp
is an elementwise math operator, but they can also both be activation functions (and that list will grow in the future, something each backend can keep up with relative to the underlying platform better than the WebNN API can satisfy broadly).This added complexity is for performance, and that performance is worthwhile, but of the currently implemented backends, the one that benefits most from these fusions is the DirectML one in Chromium, and this backend already does a one operator look-ahead anyway for simple adjacent fusions (like Conv+Relu) which means that any WebNN caller can call either {conv, relu} or {conv+relu} and get the same performance benefit - no fusion logic required from the caller at the WebNN API level. So, I support removing activation
fields from MLConv2dOptions
, MLConvTranspose2dOptions
, and MLBatchNormalizationOptions
, as the intended benefit still happens, and it eases implementation burden of front-end/back-end/conformance.
FYI @mingmingtasd.
(I also want to check with Chai and Rafael too for their opinions...)
Thanks for the support, @fdwr. I've marked #664 as ready for review!
Thanks for the proposal @a-sully and discussion @fdwr !
I think it is good to leave the activation functions fusion to underlying graph optimizer (either in Chromium backends, such as DirectML backend in Chromium, or native ML frameworks, such as XNNPACK's xnn_subgraph_fusion()). That would make the WebNN API clearer.
Previously when WebNN is used as a target backend of frameworks, such as ONNXRuntime Web and TFLite Wasm runtime, WebNN is likely given a convolution with fused activation function thanks to the framework's graph optimizer. A WebNN backend could leverage that and map to native convolution operator supporting fused activation function directly. Now the WebNN backend should ensure whether the native framework can fuse activation functions. If the native framework doesn't fuse, WebNN backend should add an optimization pass and fuse by itself in order to avoid the performance regression.
Would it be worth adding an implementation note?
Thanks, @huningxin. I didn't mention anything about op fusion in #664 because there's already this text in the spec:
Before the execution, the computation graph that is used to compute one or more specified outputs needs to be compiled and optimized. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion.
If you have more suggestions, please comment on that PR! :)
@a-sully
I didn't mention anything about op fusion in #664 because there's already this text in the spec:
Before the execution, the computation graph that is used to compute one or more specified outputs needs to be compiled and optimized. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion.
That one is good, but it is more about the graph compilation time optimization that is usually done by native ML APIs. I think we won't want to miss the graph construction time optimization opportunity when it is natively supported, as the spec text mentions for previous MLActivation
interface
These activations function types are used to create other operations. One such use of this interface is for when an activation function is fused into another operation such as conv2d() or batchNormalization() during a graph construction session. Such fused activation functions can provide a significant performance improvement when supported natively by the underlying implementation. This is intended as an optimization opportunity for implementers.
DirectML backend in Chromium exploits this op fusion opportunity when constructing the DirectML graph.
If you have more suggestions, please comment on that PR! :)
Will do!
- Spec validation issues - some operators (like softmax with the axis parameter) cannot be validated at creation time because it's not until they are combined with a base operator that the input shape is known.
@fdwr Correct! Such as the softmax with axis, if the base operator's output shape is unknown, we can't fuse them.
MLActivation
currently has two uses:batchNormalization
andconv2d
, as a hint that this activation may be fusable with the given operatorlstm
andgru
, as activations applied to gates within the given recurrent unitI have some thoughts on (2), but I'll leave that to another issue. Let's focus on (1) here :)
My claim is that there's no benefit to passing
MLActivation
s as parameters toMLOperand
s for the purpose of op fusion. Here's why:MLActivation
s from their respectiveMLOperand
if the combo is not supported for the given version of DMLMLActivation
in the WebNN spec, it may in fact be fusible with its input or outputDML_OPERATOR_ELEMENT_WISE_ADD1
andDML_OPERATOR_GEMM
, which ~map to WebNN'sadd
andgemm
operatorsWhat this means in practice is:
MLOperand
and itsMLActivation
into what's effectively just twoMLOperand
s connected to each other (as Chromium's CoreML backend currently does):MLActivation
parameters in WebNN (as Chromium's DML backend currently does)Whether a given operator can be fused with a given activation is a very backend-specific quirk. Presumably we don't want to plumb through a new
MLActivation
operator to the web for every operator which any backend decides it can now fuse with some activation! This seems best left as an implementation detail either by the user agent (as described above) or the framework (who knows how much op fusion is happening in Core ML under the hood?! 🤔)Thoughts?