webmachinelearning / webnn

🧠 Web Neural Network API
https://www.w3.org/TR/webnn/
Other
382 stars 46 forks source link

Improve "underlying platform" references in spec #462

Open inexorabletash opened 1 year ago

inexorabletash commented 1 year ago

The spec currently uses algorithms of the form (just examples copy/pasted from different places

  1. Make a request to the underlying platform to:
    1. Let opImpl be an implementation-defined platform operator for the slice operation, given starts and sizes.
    2. ...
    3. Create an implementation-defined platform operand outputImpl to represent the output, given output and opImpl.
    4. ...
  2. Connect input.[[operand]] as input to opImpl.
  3. Connect output.[[operand]] as output to opImpl.

Ideally this would all be defined. Here's one way to approach it, imagining a much simpler interface:

First we define the abstract notion of an op:

An op represents a mathematical operation as part of a graph. An op has zero or more inputs (ops) and one output (an op), and an algorithm which, when invoked, takes a corresponding number of input values and produces an output value.

In a dedicated section of the spec, we'd rigorously define each op's algorithm:

An addition op is an op with two inputs (both numbers) and the following algorithm:

  1. Let a be the first input value.
  2. Let b be the second input value.
  3. Return a + b.

An average op is an op with one input (a list) and the following algorithm:

  1. Let a be the input.
  2. Let size be a's size.
  3. Assert: size is greater than 0.
  4. Let sum be 0.
  5. For each value of a:
    1. Let sum be sum + value.
  6. Return sum / size.

Obviously these are overly simple, but in the spec right now the more complex ops are currently only defined by links. I'd tackle this incrementally, but it should be in the spec itself! See Web Audio for examples e.g. https://webaudio.github.io/web-audio-api/#fft-windowing-and-smoothing-over-time. These could be inline with the builder methods or as a separate section of the spec.

And then in a builder method for add() it'd be something like:

  1. Let output be the result of creating an MLOperand given this and desc.
  2. Let opImpl be a new addition op.
  3. Set opImpl's inputs to inputs.
  4. Set opImpl's outputs to output.[[operand]].
  5. Set output.[[operator]] to opImpl.

That's probably not quite correct but hopefully the intention is clear.

More notes:

  1. Let opImpl be a new addition op. If this fails for implementation-defined reasons, throw an "OperationError" DOMException.

Okay, that's a lot of rambling and as noted I may be missing important details, so I'll stop here. Let's iterate (here or a PR?) until we come up with a good approach that balances readability, maintainability, and precision.

cc: @huningxin

zolkis commented 1 year ago

Thanks @inexorabletash for formalizing these suggestions. I've been also thinking about what can we factor out from the quite repetitive texts in the algorithms. The intent was to first finish the first approach to the algorithms separately for each op, and then do exactly what you are suggesting now. It's good time for it.

The challenge is that in most algorithms we have similarly looking prose, but the underlying implementation can be quite different from one case to the other. It might be misleading to unify at syntactic level, and miss the semantic differences.

A first target to formalize could be how to express in a common formalized way the sequence specified for ops. For operators, it could be reduced by defining the create MLActivation algorithm, then refer to it from e.g. the elu() steps:

  1. Let op be the result of creating an MLActivation given this, "elu" and options.
  2. Return op.

Whereas for the ops, it refers to the underlying platform specific to the elu() implementation. There the question is how to parametrize these steps given the _opname , while binding the impl's op-specific internal slot for operator and output and create a "macro", meta-algorithm for that. It should be doable, and we can attempt it in a way similar you described above.

I was thinking about defining a map for the op algorithm name --> impl internal slot, and use that from the prose. But it would be quite verbose and futile: it should also be possible to just say that in prose, inputting the op_alg_name as a macro-input (expressed with prose).

huningxin commented 1 year ago

@inexorabletash , thanks for initiating this thread! I'd agree this is an area needs to be improved.

In the current spec, the "underlying platform" / "platform op" is a kind of abstraction from native APIs used in implementation, such as XNNPACK and DirectML. We assume the "platform op" would hold the WebNN op semantics, as you mentioned, usually defined by links rather than defined by algorithm steps. For some cases, it would cause ambiguity, for example, the issue 358: Please clarify interpolation algorithm for resample2d.

IIUC, the op abstraction sounds like to implement WebNN op without relying on any (native) ML APIs. I'd agree it would make the spec to be more precise and "self-contained". It reminds me the webnn-baseline project which is a pure JavaScript implementation for WPT test data generation. It doesn't rely on any ML libraries (e.g., TensorFlow.js, BTW, which is used by webnn-polyfill for performance reason) either. All ops have a JavaScript implementation, for example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/binary.js.

function binary(inputA, inputB, binaryFunc) {
  const outputShape = getBroadcastShape(inputA.shape, inputB.shape);
  const inputABroadcast = broadcast(inputA, outputShape);
  const inputBBroadcast = broadcast(inputB, outputShape);
  const outputSize = sizeOfShape(outputShape);
  const output = new Tensor(outputShape);
  for (let i = 0; i < outputSize; ++i) {
    const a = inputABroadcast.getValueByIndex(i);
    const b = inputBBroadcast.getValueByIndex(i);
    const c = binaryFunc(a, b);
    output.setValueByIndex(i, c);
  }
  return output;
}

This would add a few more things into your addition op algorithm to make it element-wise addition, some initial thoughts:

  1. We may need to support tensor (multi-dimensional array with a uniform type). The element-wise addition op should operate on tensor instead of numbers. In current spec, Operand interface already has the dimensions and data type in the [[descriptor]] internal slot, we may extend it with an [[array]] internal slot that holds the actual tensor data. With that, the implementation-defined platform [[operand]] internal slot won't be required anymore. We may also need to define additional algorithms for accessing the tensor data, such as getValueByIndex, setValueByIndex, getValueByLocation and setValueByLocation used in above element-wise binary and other op implementations. For example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/lib/tensor.js
  2. We may need to define the getBroadcastShape and broadcast algorithms instead of referring to NumPy doc. For example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/lib/broadcast.js
  3. The binaryFunc is a mathematical operation working on numbers, it could invoke addition op algorithm.
  4. The algorithm of "high-level" ops may be written by composing the algorithms of "low-level" ops, for example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/batch_normalization.js. That would also align with the pseudo decomposing code in the spec, such as for batchNormalization

Loop in @wchao1115 for comments.

inexorabletash commented 1 year ago

IIUC, the op abstraction sounds like to implement WebNN op without relying on any (native) ML APIs.

Correct. We would expect that real implementations would optimize heavily using platform APIs, parallel instructions, hardware accelerators, etc etc. Using the JS code to help craft the algorithm implementations seems like a great idea!

This would add a few more things into your addition op algorithm to make it element-wise addition,

To be clear, I was really just showing the style I'd follow rather than giving a concrete example for inclusion in the spec. Tensors etc. definitely need to be used rather than just primitive inputs.

Speaking of style, I wrote:

  1. Let a be the first input value.
  2. Let b be the second input value.

But we could use a style introducing the algorithm to simplify this, e.g. "takes two inputs (a and b, both numbers)".

inexorabletash commented 3 months ago

Restating this issue:

  1. The processing model for graph execution should be made more rigorous. At an extreme, it would look like pseudocode for an implementation of compute().

  2. The behavior of each operator should be explained rigorously; even for basic math ops there are cases to consider around overflow/underflow. I don't think we want to go as far as pseudocode for each, but in some cases that might be appropriate. Some example issues that fall into this second category:

And commentary: although spec purity is great, I don't think this is particularly high priority; tackling the second category of issues on a case-by-case basis as needed to resolve interop issues seems fine.