Closed a-sully closed 2 months ago
@a-sully
gru.weight
Not usedgru.recurrentWeight
Not usedgru.bias
Not usedgru.recurrentBias
Not used
FYI, these operands (constants only) are used by NSNet2 (noise supression) example: https://github.com/webmachinelearning/webnn-samples/blob/master/nsnet2/nsnet2.js#L51
My apologies for the error. I've updated the respective rows (they're also all "Constant only")
Turns out XNNPACK also requires some tensors to be constants - for example prelu.slope
. The Chromium implementation currently fails if the slope
operand was not created as a constant()
It would be nice to make progress here, as @a-sully notes "it's much easier to relax these restrictions than to try to impose them later"
I'm a fan of the MLConstantOperand
approach - it's very straightforward to document (e.g. MDN), specify (just WebIDL, no special case text needed), and implement (bindings code should make it "just work").
I see a :+1: from @huningxin above; @fdwr have you had a chance to think about this?
Sketched out a PR for MLConstantOperand
in #747
There are a number of compile-time optimizations a backend may perform if it knows that some data is constant.
Indeed, and I like the concept of an operand having a "constness" property, whether it be via a separate class MLConstantOperand
, or just an inspectable property on the ordinary MLOperand
(e.g. boolean MLOperand::isConstant()
) that is set true when returned from constant
, and false when generated from another operator. A constness property enables backends to make more informed decisions (not limited to CoreML), but I'm reluctant to require operands like batchNorm to only take constants.
It seems that we have the following options to support each of these operators on CoreML:
Can this be an additional property on the opSupportLimits, besides Changing my mind below - I think we should decompose in such cases in the backend.rank
and dataType
? My preference is your option 3, except add constant
to opSupportLimits, expose an MLOperand::isConstant
property, and fail synchronously before build
if the backend requires constness.
if their non-constness is simply a limitation in the framework and there are no useful reasons to pass non-const tensors
Or is the reverse true, that mandatory constness is a limitation in a framework, and there are useful reasons to pass dynamic tensors such as for custom weights or as outputs from other operators? DirectML has a means (via DML_GRAPH_NODE_TYPE_CONSTANT) to specify that a tensor is a constant, which can indeed afford optimization opportunities, but it's not required. Accepting dynamic inputs can be useful for cases like convTranspose weights being inflated from a dequantized tensor at runtime (saving significant model space). It sounds like I should scan through all the models on my hard drive to find any cases when dynamic tensors are fed into an operator which CoreML requires to be constant. ⏳
Skimming through the MIL ops, some decisions are unclear to me. Why is MIL (update - Ningxin points out iOS17 supports dynamic weights, showing that iOS may relax more of these constraints over time) Why is conv's filter conv
's weight
dynamic while requiring bias
to be constant, when there are more interesting preprocessing advantages to knowing that weights are constant?weight
tensor dynamic, but closely related conv_transpose requires the filter weight
to be constant?
Note we're not necessarily blocked for all the ops above when inputs have dynamic values, as multiple operators like normalization.batch_norm can be decomposed into operators that do not require constness, like {add, mul, div, sqrt}. When those input operands are constant, then the built-in CoreML normalization.batch_norm
can be used directly, but if non-constant, then we can still achieve dynamic behavior via op emulation. How many other operators can this apply to? conv/convTranspose/gru/lstm seem the toughest ones 🤔.
Can this be an additional property on the opSupportLimits, besides
rank
anddataType
? My preference is your option 3, except addconstant
to opSupportLimits, expose anMLOperand::isConstant
property, and fail synchronously beforebuild
if the backend requires constness.
I was initially skeptical of this approach, as it would seem that developers would be likely to develop and test on one platform (e.g. Chromium/Windows/DML) and only discover incompatibilities when users with different systems ran into failures. But if the vast majority of models that we've seen use constant weights anyway then this will be rare in practice.
How smoothly would probing opSupportLimits here and/or responding to a sync build failure integrate into ORT? Is this an expected limitation of EPs already, or would we just rely on errors to propagate up from build?
Accepting dynamic inputs can be useful for cases like convTranspose weights being inflated from a dequantized tensor at runtime (saving significant model space).
This seems useful, are all of the use cases that need none constant weights derive these weights from actual constant
s? Or we have use cases that derive none constant weights from input
s?
If all dynamic weights are derived from constants, we can solve this by doing a pre-processing subgraph to generate these constants.
The pre-processing step can be within webnn logic, in such case, we would still need to expose the requirement that these weights need to be constant or are derived from constants. WebNN will detect when a weight is not constant and run a preprocessing subgraph to generate the constants.
The pre-processing step can also be done outside of webnn, in such case we can expose the constness requirement through opSupportLimits
, and have ORT (or a javascript library) to implement the logic of preparing the const weights as a preprocessing step. The pros of doing it outside of webnn is:
Cons:
Breaking this into a couple sub problems.
This one we seem to have consensus that it's useful to express such concept.
I'd like to propose a constant operand could be either the output of builder.constant
or MLOperands generated from operations whose inputs are all constant operands.
This allows us to apply constant folding for use cases like inflating from dequantized weights, or doing any kind of reshape/split/transposes to the weights when there are format incompatibility. The constant folding pass will be done (as a separate processing step using tflite) to generate derived constants before passing the graph to the backends who have constness requirements/preference(tflite and coreml). Thoughts?
I think the MLConstantOperand is more explicit for the spec. If we want to declare some weights need to be constant, using MLConstantOperand
will be apparent in the function signature, but using MLOpernad::isConstant
will burry this in the validation steps. WDYT?
If we define MLConstantOperand to be either straight from builder.constant
or constant derived. Then I think it should work for almost all cases. As @a-sully mentioned above "it's much easier to relax these restrictions than to try to impose them later".
We can add such constraints for weights & bias to the function signatures as it's provide much more clarity and consistency than the dynamically probed opSupportLimits. And we can always relax them when let's say we find a use case for lstm to need dynamic weights.
WDYT?
@fdwr
Why is conv's filter
weight
tensor dynamic, but closely related conv_transpose requires the filterweight
to be constant?
I noticed that iOS17 conv_transpose allows dynamic filter weight
: https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS17.conv.conv_transpose
conv/convTranspose/gru/lstm seem the toughest ones
gru and lstm can also be emulated if non-constant weights provided.
If we assume conv/convTranspose weights can be dynamic, then the non-constant bias can be emulated by element-wise add operation too.
There are a number of compile-time optimizations a backend may perform if it knows that some data is constant.
Could we leave constant tensor optimization to implementation rather than API requirement?
Adding emulation when inputs are not constants for these kinds of operations:
So if emulation is necessary, it would be better to by done on the client side IMO? If we don't have use cases to support dynamic weights, I would prefer to expose such requirement to drive users to deploy models that are more reliably performant?
I noticed that iOS17 conv_transpose allows dynamic filter
weight
: https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS17.conv.conv_transpose
@huningxin : Indeed, good to see that iOS relaxed that constness requirement later. Mingxian Yang mentions here that some stylegan models needed dynamic weights (for conv at least, not convTranspose). I still owe you all an analysis scanning through my local model collection to find any dynamic cases (hopefully before tomorrow's meeting).
So I told Phillis and Reilly in the last meeting that I would scan my model cache for any of the above operators with dynamic inputs (I actually did right afterward, but it took a while longer to collate the results... 😓), and I found enough cases that I request the Chromium CoreML backend support these cases via decomposition: conv2d.bias
, convTranspose2d.bias
, batchNormalization.*
, instanceNormalization.*
, layerNormalization.*
, prelu.slope
.
Constant
operator (+many more weights/biases that, although unlikely to actually be overridden in practice, can technically be overridden via dynamic graph inputs).They can be organized into these categories:
(1) Interestingly some weights were upcast to float32 for compute but stored as float16 in the weights (perhaps to save model space and weight upload time):
Similarly, weights were quantized to save space, but inflated at runtime:
(2) This conv bias is technically overridable/bindable at runtime. I could see some other models having similar dynamic overrides at runtime for fancier LoRA-like behavior:
(3) There were several silly cases with pointless Identity's (into the Conv bias in this case), but I'm not too worried about these because a WebNN backend could forward the tensor such that if identity's input.isConstant
was true that identity's output.isConstant
property would also be true (a basic degree of constant folding like Phillis mentions above):
(4) Then there were more complex cases which weren't trivial identity elimination opportunities, and maybe such cases could be resolved into something simpler and even a direct final tensor, but merely seeing that these cases exist, I have no confidence that there aren't other patterns I haven't seen:
For CoreML Prelu's alpha slope tensor, "The length of alpha must match the second dimension of x (channel dimension)", but there are already plenty of cases where the slope does not fit that constraint (like pictured below), merely broadcast compatibility. So we need to emulate prelu there anyway, and using that existing emulation path for !isConstant too is little more work.
DML_CONSTANT_DATA_GRAPH_NODE_DESC
), which affords some optimizations for certain operators, and it's not required (the tensor input can always be dynamic), but it must be CPU-side data for the preprocessing.Adds a lot more implementation complexity.
Except for GRU and LSTM (which do add complexity), it should add only a little more complexity? e.g. For layernorm and conv, it's an if/else
:
if layerNormalizationOptions.scale.isConstant && layerNormalizationOptions.bias.isConstant
return iOS15.normalization.layer_norm(input, axes, scale, bias, epsilon)
else
// The "gamma" (scale!) and "beta" (bias!) tensors are optional and default 1 and 0.
return add(mul(iOS15.normalization.layer_norm(input, axes, null, null, epsilon), scale), bias)
endif
if conv2dOptions.bias.isConstant
return iOS15.conv.conv(input, filter, ..., bias)
else
// The dynamic bias is omitted in conv but added explicitly afterward.
return add(iOS15.conv.conv(input, filter, ..., null), bias)
endif
Is less efficient but for developers they wouldn't know the cause.
Isn't it only less efficient for the uncommon case, while the common case executes at full efficiency? Higher layer developers won't be able to implement the dynamic case as efficiently as the backend because (a) the backend operates closer to the metal with fewer layers (b) if a future CoreML relaxes constant constraints, then the backend can adopt that directly, whereas callers would not know and would still continue to artificially emulate (c) callers would unnecessarily decompose on backends that already support dynamic inputs directly if the input was defined as MLConstantOperand
, because they wouldn't know it actually wasn't required to be a constant.
So if emulation is necessary, it would be better to by done on the client side IMO?
Supporting in WebNN (rather than by clients) benefits all callers, and callers do not need to repeat code that they potentially get wrong. Only the backend knows whether the underlying API truly requires decomposition and is best poised to make that decision.
I think the MLConstantOperand is more explicit for the spec.
Well it certainly is more in-your-face by being visible in the code definition, but then other tensor requirements like data types and rank are not baked into the type name. Joshua's pending operator table for spec could include a 3rd row for constness (beyond data type and rank). Though, if we can handle dynamic cases in the backend fully (defaulting to the direct operator when constant and decomposing when dynamic), then maybe we don't even need this type/property externally (only internally for fast path testing).
When deciding between properties vs polymorphism, notice properties honor orthogonality better than polymorphism. Right now, we're just talking about MLConstantOperand, but imagine we have a future MLAddressSwizzledOperand (a theoretical optimized operand in a tiled memory layout). Will we then need every permutation of them? (MLOperand, MLConstantOperand, MLAddressSwizzledOperand, MLConstantSwizzledTiledOperand...). Similarly, if we had (we don't, but if we did) baked data typeness into operands (MLFloat16Operand, MLFloat32Operand, MLInt8Operand ...) then it would become unwieldly quite quickly (MLFloat16Operand, MLConstantFloat16Operand, MLFloat32Operand, MLConstantFloat32Operand ...). Then if you wanted to extract orthogonal properties, (given those 4 distinct classes above: MLOperand, MLConstantOperand, MLAddressSwizzledOperand, MLConstantSwizzledTiledOperand), would you need to test all 4 cases to extract the 2 properties?
Properties enable concrete types in function signatures. It's common to have debugging utility functions (like I wrote to print all the MLOperand attributes to debug Stable Diffusion and debug other early operators), and it feels cleaner to pass a concrete MLOperand to a Typescript function rather than using "is a" dynamic type checks, or since WebIDL can technically be called from C++ too (a strongly typed language), without even JS in the picture, then it's sensible to pass a concrete type and avoid RTTI dynamic_casts
to check a trait.
Closing this issue according to the discussion at TPAC here: https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#08b6
Since all CoreML operators which require const
parameters are decomposable into lower-level operators which do not have const
requirements, the user agent can decompose when necessary
The
constant()
operator is special. Constants are effectively inputs that are known at the time of graph compilation (i.e.build()
).You may ask, why do we need a separate method when we could just pass this constant data as an
input()
?Most discussions I've seen so far about
constant()
(https://github.com/webmachinelearning/webnn/issues/614#issuecomment-2019116877, this thread about the since-abandoned "fill sequence" overload, etc) have been concerned with optimizing performance. There are a number of compile-time optimizations a backend may perform if it knows that some data is constant.Our experience with CoreML has given us another reason:
The backend requires that a parameter to an operator must be constant
Take
conv2d()
, for example. It's defined for CoreML here. Thebias
parameter is defined as:bias: const tensor<[C_out],T>
Meanwhile, WebNN allows this to be any
MLOperand
, as long as it's of the appropriate shape and dtype:This appears to be a straightforward plumbing through of DML's interface, which does not require the
BiasTensor
to be a constant. Neither does the corresponding operator for TFLite. From what I can tell, this seems to be because these frameworks don't have a way to express that some input tensor must be const. The options are either to pass the parameter as the framework's generic representation of a Tensor - which would in practice always(?) be created from aconstant()
- or to pass the parameter as a 1D array directly. If the parameters may be large (and perhaps unbounded), the former is the more appropriate choice.To get a sense for whether this is a reasonable hypothesis, I've inspected of all† uses of the affected operators in the WebNN Samples repo repo:
operator.param
batchNormalization.mean
batchNormalization.variance
batchNormalization.scale
batchNormalization.bias
conv2d.bias
convTranspose2d.bias
gru.weight
gru.recurrentWeight
gru.bias
gru.recurrentBias
instanceNormalization.scale
instanceNormalization.bias
layerNormalization.scale
layerNormalization.bias
lstm.weight
lstm.recurrentWeight
lstm.bias
lstm.recurrentBias
lstm.peepholeWeight
prelu.slope
†This list only includes WebNN operators which trivially map to CoreML operators. WebNN operators which need to be in terms of other CoreML operators will be subject to the restrictions of those respective CoreML operators. For example, CoreML doesn't have operators for
gruCell
orlstmCell
, so these operators will need to be implemented in terms ofgru
andlstm
, respectively. These operators will in turn need many of their parameters to be const, as well††One caller of passes the result of a
reshape
... but that's only because the sample was written beforeconstant()
took anMLOperandDescriptor
. Thereshape
is just assigning dimensions to aconstant()
. Nowadays we'd just pass theconstant()
directlyRemarkably, every single instance where one of these params is used in the WebNN Samples, it was created from a
constant()
. Cool!Of course, this is not close to a comprehensive list of all models hope to run with WebNN. That being said, if there are no significant known use cases for passing any of these parameters as non-constant tensors - if their non-constness is simply a limitation in the framework and there are no useful reasons to pass non-const tensors - I think there's a reasonable argument that WebNN should require these parameters to be constants. @fwdr could you perhaps provide some more color here? :)
It seems that we have the following options to support each of these operators on CoreML:
operator.param
must be a constantMLOperand
(my tentative preference)MLConstantOperand
interface which extendsMLOperand
, and specify thatparam
takes anMLConstantOperand
MLOperand
has a "kind", as the Chromium implementation already does, and throw aTypeError
if not a "constant" kind. This may be confusing to developersoperator.param
asequence<MLNumber>
operator.param
is not a constant only on CoreMLbuild()
on Chromium, though we could conceivably make this a synchronous check on the respective builder method, especially if we have defined procedures for querying for backend-specific support (see #463)Thoughts?