webmachinelearning / webnn

🧠 Web Neural Network API
https://www.w3.org/TR/webnn/
Other
370 stars 46 forks source link

Constant sequential filling operation needs output shape parameter #492

Closed fdwr closed 5 months ago

fdwr commented 9 months ago

See constant(start, end, step, type) at https://www.w3.org/TR/webnn/#dom-mlgraphbuilder-constant-start-end-step-type

Problem

Computing the output shape indirectly from a floating-point calculation is brittle because the caller, depending on its equation and rounding, can yield a different size from WebNN which will cause graph execution to fail due to mismatched tensors. Such incompatible off-by-one differences can occur via differing calculation precisions (e.g. the caller computes the output shape in float64 whereas Chromium computes it in float32) and via differing handling of the inclusive/exclusive end value (see a non-hypothetical example below). So it's prudent for the library to pass WebNN the shape explicitly, and we've increasingly been moving toward more explicitness anyway (e.g. removing the special null values in Reshape). The output shape was in the fillSequence operator, but got dropped, and it reoccurred to me while reviewing Bin's Chromium implementation.

Proposal

- constant(start, end, step, type) method steps are:
+ constant(start, step, outputShape, type) method steps are:
...
- Let size be max(0, ceil((end - start)/step)).
- Set descriptor.[dimensions](https://www.w3.org/TR/webnn/#dom-mloperanddescriptor-dimensions) to [size].
+ Set descriptor.[dimensions](https://www.w3.org/TR/webnn/#dom-mloperanddescriptor-dimensions) to [outputShape].
+ Let outputElementCount be the product of all elements in descriptor.[dimensions](https://www.w3.org/TR/webnn/#dom-mloperanddescriptor-dimensions).
...
- [For each](https://infra.spec.whatwg.org/#map-iterate) index in [the range](https://infra.spec.whatwg.org/#the-range) 0 to size, exclusive:
+ [For each](https://infra.spec.whatwg.org/#map-iterate) index in [the range](https://infra.spec.whatwg.org/#the-range) 0 to outputElementCount, exclusive:

* Note the end is now redundant, since it was only used for size calculation, not needed for the core start + (index * step) formula.

Problematic Library differences

We see differences between TF and PT with the exact same inputs due to how they handle the end result (included vs excluded), which emphasizes why WebNN should be clearly informed by the caller of the expected shape:

import tensorflow
result = tensorflow.range(5, 5, 2)
print("value:", result)

# output = [], shape= (0,)
import torch
result = torch.range(5, 5, 2)
print("value:", result)

# output = [5], shape= (1,)

Usage Examples

... = constant(0, 1, [3,3], "float32"); // Fill with an increasing sequence.
// output = [[0,1,2],[3,4,5],[6,7,8]

... = constant(9, -1, [3,3], "float32"); // Fill with a decreasing sequence.
// output = [[9,8,7],[6,5,4],[3,2,1]

... = constant(1, 0, [3,3], "float16"); // Quickly fill large tensor with a constant value, like numpy.ones(outputShape).
// output = [[1,1,1],[1,1,1],[1,1,1]    // Note this is currently awkward otherwise for float16, because you have to
                                        // pass 0x3C00 (1.0) to fill a Uint16Array.

@wchao1115 @huningxin

huningxin commented 9 months ago

constant(start, step, outputShape, type)

Could the parameters outputShape and type be encapsulated by MLOperandDescriptor?

miaobin commented 8 months ago
constant(start, step, outputShape, type)

So the final proposal is like this: constant(start, step, operand_descriptor), right (I guess the order of parameters is like this)? @huningxin @fdwr WDYT?

fdwr commented 8 months ago

constant(start, step, outputShape, type)

Could the parameters outputShape and type be encapsulated by MLOperandDescriptor?

I'm fine with it either way. If you as a caller already have shape and data type variables, then requiring a temporary tensor desc is a little more bothersome than just passing them directly as parameters, but if you already have a tensor desc, then passing that directly is slightly more convenient. 🤷‍♂️ Since the internal implementation complexity doesn't really change either way, maybe asking someone from the WebNN API caller point of view (like Wanming from ORT) would be a nice tie breaker?

Whatever we do, we should use the same pattern in the future with similar operators. So if we introduce, for example, a diagonal matrix operator that fills a new tensor with 1's along the diagonal and 0's everywhere else, then it should take a tensordesc or two separate parameters. I don't see that we have much existing precedent though. Any ops that take a shape currently (like reshape or expand) take the shape as a separate parameter, and ops that take a data type take them as a separate parameter (like cast), but we don't have any others that take both shape and data type.

Honry commented 8 months ago

Could the parameters outputShape and type be encapsulated by MLOperandDescriptor?

No much impact to WebNN EP, I am fine with it either way as well.

zolkis commented 8 months ago

In Web APIs they usually prefer passing options in dictionaries. Semantically important parameters make sense to be passed as args, if there is not many of them.

fdwr commented 8 months ago

In Web APIs they usually prefer passing options in dictionaries. Semantically important parameters make sense to be passed as args, if there is not many of them.

Well these are essential parameters, not options, but it would be more consistent with the existing constant(descriptor, bufferView) overload to accept a tensor description MLOperandDescriptor. So, seeing that precedent, I more clearly favor making it take MLOperandDescriptor too. Then we would have these two (or three) overloads:

1. constant(descriptor, bufferView);
2. constant(descriptor, start, step = 0);
3. constant(value, type) // I recall Ningxin proposed removing this one elsewhere due to implementation complexity.

Usage:

const x = builder.constant({dataType: 'uint32', dimensions: [4]}, new Float32Array([1,3,5,7]));
const y = builder.constant({dataType: 'uint32', dimensions: [4]}, 1.0, 2.0); // 1,3,5,7

I guess the order of parameters is like this)

If it causes no ambiguity in the IDL, then I'd like both overload parameters to go in the same order, rather than inconsistently reversed between them. Also if you make step optional (defaulting to 0), then that satisfies the reasonably likely use case of wanting to fill with a solid value (const y = builder.constant({dataType: 'uint32', dimensions: [4]}, 1.0)). Conveniently, I think that also obviates the constant(value, type) overload?

wacky6 commented 8 months ago

In Web APIs they usually prefer passing options in dictionaries. Semantically important parameters make sense to be passed as args, if there is not many of them.

I think marking fields are required in IDL dictionary is fine (it's not uncommon). So the proposal in https://github.com/webmachinelearning/webnn/issues/492#issuecomment-1885944180 looks reasonable to me.


The following is also fine IMO.

dictionary FillSequenceDescriptor {
  required double start;
  double step = 0;
};

constant(MLOperandDescriptor desc, FillSequenceDescriptor fillSequence);

The type we use for start and step might be tricky. I think we'll at least need a double to cover int32 and float32 (fp32 can't accurately represent large int32 start value).

How to support for int64 remains an open question :)

fdwr commented 8 months ago

I think we'll at least need a double to cover int32 and float32

One of the three current constant overloads already takes a double...

MLOperand constant(double value, optional MLOperandDataType type = "float32");

...and so there's precedent for passing at least that much precision. I agree that float32 is not sufficient to cover the accuracy needed (it cannot even represent odd int32 values >= 16777216 accurately).

How to support for int64 remains an open question

One operator (split) already accepts parameters of differing types:

partial interface MLGraphBuilder {
    sequence<MLOperand> split(
        MLOperand input,
        (unsigned long or sequence<unsigned long>) splits, <----------------
        optional MLSplitOptions options = {}
    );
};

So (barring any IDL issues I'm not seeing?) we could accept (double or BigInt) to accurately cover the full floating point and integer range:

MLOperand constant(MLOperandDescriptor descriptor, (double or BigInt) start, (double or BigInt) step = 0.0);

Curiously I see a BigUint64Array, but I don't see a matching BigUint type? 🙃 I suppose it doesn't matter because a BigInt can hold unsigned values too, with the precise type uint64 being clarified by the operand desc's dataType (and I see that BigInt.asUintN(64, (2n ** 64n) - 1n) runs fine, printing 18446744073709551615n as expected).

miaobin commented 8 months ago

So (barring any IDL issues I'm not seeing?) we could accept (double or BigInt) to accurately cover the full floating point and integer range:

MLOperand constant(MLOperandDescriptor descriptor, (double or BigInt) start, (double or BigInt) step = 0.0);

I thought of the same method as you. And I am trying to add bigint to verify the feasibility of the implementation.

miaobin commented 8 months ago

Updates to the constant sequential filling operation:

  1. CL5091540 (extends the build method of constant) and CL5243778 (defines mojom and DirectML node) are developed according to IDL:MLOperand constant(MLOperandDescriptor desc, (double or bigint) start, (double or bigint) step);

  2. We met the problem of using the keyword or in IDL to create a union type containing double and bigint. In the generated file "v8_union_bigint_double.cc", it was only judged that the incoming data was of JS number type and there is no judgment that it is BigInt, which causes an error to be reported when BigInt data is passed in when calling this API using JS code. (For example, when we call constant op, we pass in start=123n). @fdwr Maybe we should file an issue to chromium for this problem?

On one hand, we can continue to review these two CLs to find potential problems and verify feasibility. On the other hand, can we first create an updated specification PR which matches what is being implemented and get folks with more WebIDL and JavaScript API design knowledge to take a look at it to make sure it will work the way we intend it to as Reilly said? WDYT @huningxin @fdwr @wacky6

inexorabletash commented 7 months ago

Drive-by comment: Use of bigint is extremely limited in Chromium at the moment, so I'm not surprised if the binding code generation can't handle a union with it yet. Note also this in WebIDL: https://webidl.spec.whatwg.org/#limit-bigint-numeric-unions - while WebIDL support for a union of bigint and another numeric type is desired , there is a request to file an issue so that the problem can be discussed. And... from that bug tracker, it doesn't look like anyone has actually done it yet.

fdwr commented 7 months ago

https://webidl.spec.whatwg.org/#limit-bigint-numeric-unions

It would not be appropriate to accept such a union, only to then convert values of the numeric type to a bigint for further processing, as this runs the risk of introducing precision errors. Please file an issue before using this feature.

Joshua: I see. Well the point in this case is to preserve values all the way through and avoid precision errors. I see one existing issue here https://github.com/whatwg/webidl/issues/959 (but oddly empty and already closed).

inexorabletash commented 7 months ago

Yeah, I think the concern (from the WebIDL editors) is that they want to ensure that it's going to be used correctly (as proposed here) with handling at the C++ (etc) layer that preserves full precision either as a float64 or bigint. The WebIDL editors just don't want to do extra work unless it's really needed, and don't want anyone writing such a union unless the precision is actually needed. It absolutely is here... this may just be the first spec where that's actually the case! 🤯

inexorabletash commented 7 months ago

2. In the generated file "v8_union_bigint_double.cc", it was only judged that the incoming data was of JS number type and there is no judgment that it is BigInt

Comparing the blink bindings implementation for unions and the spec for union conversions, it looks like the cases for bigint are simply missing i.e. not implemented yet.

fdwr commented 5 months ago

We have some parameter validation discussions happening in the fill-sequence constant overload CR. I'm content to restrict this sequential fill overload to reject infinity as an input (to simplify parameter validation) so long as we have some other means of creating tensors filled with such values; and I propose we repurpose the existing constant(value, type) overload to instead constant(descriptor, MLNumber), which accepts any possible value for that data type from the caller (including infinity and NaN, which are valid representations for floating point numbers). Thoughts?

fdwr commented 5 months ago

We're deciding between (a) vs (b):

(a) 3 separate functions, one consuming buffers, one for constant fills, one for sequence fills:

graphBuilder.constant(descriptor, buffer);    // initialize via buffer
graphBuilder.constant(descriptor, 13.0);      // fill with 13's
graphBuilder.constant(descriptor, 42.0, 1.0); // fill sequence 42,43,44,45,...

➕ more direct

(b) 2 functions, one consuming buffers, and one that takes an options object with optional params:

dictionary MLConstantOptions {
  MLNumber start = 0;           // MLNumber is unrestricted (thus does permit inf/NaN)
  (bigint or double) step = 0;  // not MLNumber since inf/NaN are not allowed
  // bikeshed naming, maybe `start` -> `value`.
};

graphBuilder.constant(descriptor, buffer);                 // initialize via buffer
graphBuilder.constant(descriptor, {start: 13.0});          // fill with 13's
graphBuilder.constant(descriptor, {start: 42.0, step: 1}); // fill sequence 42,43,44,45,...

Credit @a-sully

➕ pretty immediately clear to the reader, without needing to lookup whether parameter 2 or 3 is the step.

wacky6 commented 5 months ago

2 functions, one consuming buffers and one that takes an options object with optional params

This seems nice from caller's perspective.

But I think we'll need hand-written parsing logic.

I don't think WebIDL binding generator is smart enough to tell different kinds of dictionaries apart.

dictionary FillConstant{
  fill: MLNumber  // or `start`
};

dictionary FillWithSequence {
  start: MLNumber
  step: MLNumber
}
a-sully commented 5 months ago

➕ pretty immediately clear to the reader, without needing to lookup whether parameter 2 or 3 is the step.

Yup! That recommendation is based on guidance here: https://w3ctag.github.io/design-principles/#prefer-dictionaries

You should also consider accepting mandatory parameters through a dictionary, if it would make the API more readable, especially when they are of primitive types.

And my suggestion for combining the "fill with a given value" and "fill with this value + a step" methods into one is based on the guidance here: https://w3ctag.github.io/design-principles/#optional-parameters

If the dictionary argument itself is optional, then just this is sufficient:

graphBuilder.constant(descriptor); // Fill a tensor of the given dimensions with zeros

I don't think WebIDL binding generator is smart enough to tell different kinds of dictionaries apart.

I'm not sure, though as @fdwr mentioned above I think if we use value instead of start that might be sufficient for both use cases?

a-sully commented 5 months ago

Taking a step back and looking at the problem statement described in the issue description, I'd like to poke at this claim:

  • Note the end is now redundant, since it was only used for size calculation, not needed for the core start + (index * step) formula.

Let's first look at the Problematic Library differences:

We see differences between TF and PT with the exact same inputs due to how they handle the end result (included vs excluded), which emphasizes why WebNN should be clearly informed by the caller of the expected shape:

import tensorflow
result = tensorflow.range(5, 5, 2)
print("value:", result)

# output = [], shape= (0,)
import torch
result = torch.range(5, 5, 2)
print("value:", result)

# output = [5], shape= (1,)

PyTorch seems to have resolved this by deprecating range in favor of arange, which behaves the same as TF, Python's builtin range(), numpy's arange, and also Core ML's equivalent. Each of these operators take start, end, and step arguments

So there's no problematic library difference, right?...

...What about DirectML?!

DirectML does not require an explicit end parameter, instead inferring the endpoint from the dimensions (as this issue proposes to do for WebNN). But this has a few implications:

Infinite end

@fdwr threw out the following scenario over on the Chromium code review:

We also have to consider what happens for the endpoint, because even though the delta can't be infinite, it can be close to infinite, and starting at a large value could yield infinity for the endpoint (which would be okay if the start accepts infinity too, but it would be inconsistent to reject start being infinity whereas the end could result in infinity):

graphBuilder.constant(descriptor, {start: 3.4e+38, step: 3.4e+38}); // inf end

This makes sense in DML, which (according to what @fdwr tells me :P ) will gladly can saturate floats to inf once they exceed the max value

This does not make sense for the other frameworks, which require a finite end value:

torch.arange(start=5, end=float("inf"), step=2)
# RuntimeError: unsupported range: 5 -> inf

This means that even if we decide that an end value does not need to be specified via WebNN - since we can infer it from start + step * size - we must be able to compute it to a finite value

But even if it's finite, can we be sure the values are filled as we expect?

Quirks of floating point math

Say you want to fill a tensor of shape (10,) with evenly-spaced float16 values from 0 to 1. In the original IDL, this might look like:

constant(start=0., end=1., step=0.1, type="float16")

This maps closely to the (non-DML) APIs described above. For example:

np.arange(start=0., stop=1., step=0.1, dtype=np.dtype(np.float16))
# array([0.    , 0.1   , 0.2   , 0.2998, 0.4   , 0.5   , 0.5996, 0.6997,
#        0.8   , 0.9   ], dtype=float16)

DML, on the other hand, (according to their docs) performs the following pseudocode:

result = numpy.zeros(shape=[10,], dtype=np.float16)
for i in range(1, 10):
  result[i] = result[i-1] + numpy.float16(0.1)
# array([0.    , 0.1   , 0.2   , 0.2998, 0.4   , 0.5   , 0.6   , 0.7   ,
#        0.8003, 0.9004], dtype=float16)

The values are different!

The numpy docs have an explanation:

The actual step value used to populate the array is dtype(start + step) - dtype(start) and not step

and also

the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop

A quick check of TFLite and PyTorch show the same behavior. Core ML's range is implemented using np.arange. And Python's built-in range function sidesteps this problem by only supporting integers

Another thing to consider are the calculations needed by the caller themselves. In the case above, the caller has start, end, and dimensions. If the WebNN API surface includes a start, step, and dimensions, then the caller will have to compute step manually. This computation itself has the potential to introduce unwanted imprecision


So what should we do?

In order to ensure we get the same values across platforms, I propose we specify the behavior of np.arange. Concretely, this means:

Chromium's DML implementation can then use start and internalStep - rather than the step value passed from JS. I believe that should give us the same behavior across all platforms 🤞

Meanwhile, requiring a finite end value sidesteps the whole "infinite end" problem

If we go down this route, then I think three overloads probably makes sense? :)

a-sully commented 5 months ago

Taking an even bigger step back, I was curious what the rationale for adding this operator (which I'll refer to as fillSequence below) was in the first place, since this functionality can be achieved by filling an ArrayBuffer (or MLBuffer, soon https://github.com/webmachinelearning/webnn/issues/614#issuecomment-2025710958) from JavaScript and passing it as a constant()... From what I can tell based on digging through old issues and PRs, the rationale for adding this operator can be traced back to this comment by @wchao1115 https://github.com/webmachinelearning/webnn/pull/478#discussion_r1391468552 (copied below)

Yes, [user code can generate the data and create a constant operand] but it'll cause additional copy on the array buffer. This overload makes sure that there is just 1 copy of the constant data created for the resulting operand.

In light of the recent discussions on #614 about achieving minimal buffer copies by using MLBuffer, I wonder whether the optimization promised by fillSequence is still worthwhile? Recalling the conclusions from #614 (see https://github.com/webmachinelearning/webnn/issues/614#issuecomment-2021547166):

  • passing an MLBuffer as an input() will allow the same data to be passed to multiple MLGraph instances (no additional copies). This data cannot be optimized during graph initialization because it's passed as an input rather than a constant
  • passing an MLBuffer as a constant() will ensure there's only one copy of the unoptimized constant data. Each MLGraph which this buffer is passed to may create its own optimized copy of the constant data during graph initialization.

Passing an MLBuffer as an input() may achieve the same "single copy of the buffer" behavior that fillSequence aims to provide. Additionally, this input buffer would only need to be allocated and have its contents computed once, whereas use of the fillSequence operator would require re-allocating the buffer and re-computing the sequence each time compute() is called on the MLGraph. My understanding of the situation is as follows:

fillSequence input(MLBuffer) constant(MLBuffer)
Copies one per inference one† one† + one per subgraph
Optimized constant data ??? No Yes
Times contents computed once per inference once, up-front once, up-front

Questions for the group:

  1. Is the table above accurate? If not, please help me correct it!
  2. Can someone help fill in the ???? Are there examples of frameworks which may internally optimize this operator - e.g. by cleverly avoiding realizing the full output tensor?
  3. Do we have target use cases in mind for fillSequence()? If so, what are the expected usage patterns?

†Hopefully 🤞 Pending further MLBuffer prototyping @bbernhar

fdwr commented 5 months ago

My biggest concern is deducing tensors shapes from wobbly floating point calculations, especially considering the caller (maybe Javascript, maybe compiled WebAssembly, maybe TF with one rounding behavior, maybe PT with another rounding behavior...) and the browser (with its own FPU details) can differ (it would be bad if the browser computed its range subtraction and delta division via float64 whereas the caller used float32, or vice versa), and a mismatch between the output shape can result in the whole graph execution breaking due to incompatible shapes when combined with other operators.

High-level frameworks like TF Lite and PyTorch can do they want and include such helpful functions based solely on FP parameters, but low-level API's like WebNN should take more explicit tensor shapes.

PyTorch seems to have resolved this by deprecating range in favor of arange, which behaves the same as TF

🤔 Seems not so. I picked nice integer numbers for the example above, but more generically, if you pass the same values into tf.range and torch.arange, you get different values, and more adversely, different shapes:

import torch
import tensorflow
import random

ptRange = torch.arange(    0.3908395899830157, 1.9541979499150786, 0.3908395899830157, dtype=torch.float32)
tfRange = tensorflow.range(0.3908395899830157, 1.9541979499150786, 0.3908395899830157, dtype=tensorflow.float32)

print("---------------------------------")
print("PT shape: ", ptRange.shape)
print("   dtype: ", ptRange.dtype)
print("   value: ", ptRange)
print("TF shape: ", tfRange.shape)
print("   dtype: ", tfRange.dtype)
print("   value: ", tfRange)

# PT shape:  torch.Size([4])
#    dtype:  torch.float32
#    value:  tensor([0.3908, 0.7817, 1.1725, 1.5634])
# TF shape:  (5,)
#    dtype:  <dtype: 'float32'>
#    value:  tf.Tensor([0.39083958 0.78167915 1.1725187  1.5633583  1.9541979 ], shape=(5,), dtype=float32)

I'd sooner drop this op altogether (and force the caller to fill in the Float32Array) than include it with FP deduced shape.

DML, on the other hand, (according to their docs) performs the following pseudocode

(update 20240419T0320) I opened a doc fix for the pseudocode. The actual DML output is [0. 0.1 0.2 0.2998 0.4 0.5 0.5996 0.6997 0.8 0.9 ]. 📖🐞

a-sully commented 5 months ago

Thanks again for the deep dive @fdwr :)

Just to clarify a couple points...

...and the browser (with its own FPU details) can differ (it would be bad if the browser computed its range subtraction and delta division via float64 whereas the caller used float32, or vice versa)

If we went down this route we would clearly specify the behavior of the browser - say, to always be float64. So if frameworks are doing their parallel calculations in some other data type, that's a bug in the framework and not something we (as builders of the web platform) should be too worried about.

Of course we should strive to design the API such that callers fall into the pit of success. But there's a difference between an API which is unintuitive to use (say, because the parameters are ordered differently from the analogous function you're used to in Python) and one which cannot reliably be used correctly_ (say, if the browser sometimes computes shapes in float32 and other times in float64, and you have no way to know which will be used).

I think the "FP deduced shape" API we're discussing here is the former. But it sounds like (based on comments like the one below) you're suggesting that that may not be realistic, given the discrepancies in the underlying frameworks?

...and a mismatch between the output shape can result in the whole graph execution breaking due to incompatible shapes when combined with other operators

If that's true, I concur:

I'd sooner drop this op altogether (and force the caller to fill in the Float32Array) than include it with FP deduced shape.

Recalling the questions I posed to the group earlier...

  1. Is the table above accurate? If not, please help me correct it!
  2. Can someone help fill in the ???? Are there examples of frameworks which may internally optimize this operator - e.g. by cleverly avoiding realizing the full output tensor?
  3. Do we have target use cases in mind for fillSequence()? If so, what are the expected usage patterns?

Unless there are very compelling answers to 2 and 3, I'm in favor of removing this operator. WDYT?

fdwr commented 5 months ago

unless there are very compelling answers to 2 and 3, I'm in favor of removing this operator. WDYT?

@a-sully : I had to analyze some models deeper first before I could reply. Looking through ~700 models on my mini-collection with 19 having Range in them, all the cases (unless I accidentally missed opening one) turn out to either:

If we encounter a more compelling case later where a large fill directly on the GPU is worthwhile, then we can consider re-adding fillSequence (with explicit shape parameter, and only computing once rather than each compute call). Until then, letting the caller just fill in a little array with numbers (with their framework idiosyncrasies intact) sounds fine, and we can remove this operator. I'll talk with Miao and Ningxin in a few minutes for their take... ⏱️.

a-sully commented 5 months ago

Thanks again for the investigations @fdwr! I just put up #656 :)

a-sully commented 5 months ago

The above PR merged. Let's close this issue?