Open inexorabletash opened 1 year ago
Thanks @inexorabletash for formalizing these suggestions. I've been also thinking about what can we factor out from the quite repetitive texts in the algorithms. The intent was to first finish the first approach to the algorithms separately for each op, and then do exactly what you are suggesting now. It's good time for it.
The challenge is that in most algorithms we have similarly looking prose, but the underlying implementation can be quite different from one case to the other. It might be misleading to unify at syntactic level, and miss the semantic differences.
A first target to formalize could be how to express in a common formalized way the sequence specified for ops. For operators, it could be reduced by defining the create MLActivation algorithm, then refer to it from e.g. the elu() steps:
- Let op be the result of creating an MLActivation given this, "elu" and options.
- Return op.
Whereas for the ops, it refers to the underlying platform specific to the elu()
implementation. There the question is how to parametrize these steps given the _opname , while binding the impl's op-specific internal slot for operator and output and create a "macro", meta-algorithm for that. It should be doable, and we can attempt it in a way similar you described above.
I was thinking about defining a map for the op algorithm name --> impl internal slot, and use that from the prose. But it would be quite verbose and futile: it should also be possible to just say that in prose, inputting the op_alg_name as a macro-input (expressed with prose).
@inexorabletash , thanks for initiating this thread! I'd agree this is an area needs to be improved.
In the current spec, the "underlying platform" / "platform op" is a kind of abstraction from native APIs used in implementation, such as XNNPACK and DirectML. We assume the "platform op" would hold the WebNN op semantics, as you mentioned, usually defined by links rather than defined by algorithm steps. For some cases, it would cause ambiguity, for example, the issue 358: Please clarify interpolation algorithm for resample2d.
IIUC, the op abstraction sounds like to implement WebNN op without relying on any (native) ML APIs. I'd agree it would make the spec to be more precise and "self-contained". It reminds me the webnn-baseline project which is a pure JavaScript implementation for WPT test data generation. It doesn't rely on any ML libraries (e.g., TensorFlow.js, BTW, which is used by webnn-polyfill for performance reason) either. All ops have a JavaScript implementation, for example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/binary.js.
function binary(inputA, inputB, binaryFunc) {
const outputShape = getBroadcastShape(inputA.shape, inputB.shape);
const inputABroadcast = broadcast(inputA, outputShape);
const inputBBroadcast = broadcast(inputB, outputShape);
const outputSize = sizeOfShape(outputShape);
const output = new Tensor(outputShape);
for (let i = 0; i < outputSize; ++i) {
const a = inputABroadcast.getValueByIndex(i);
const b = inputBBroadcast.getValueByIndex(i);
const c = binaryFunc(a, b);
output.setValueByIndex(i, c);
}
return output;
}
This would add a few more things into your addition op algorithm to make it element-wise addition, some initial thoughts:
[[descriptor]]
internal slot, we may extend it with an [[array]]
internal slot that holds the actual tensor data. With that, the implementation-defined platform [[operand]]
internal slot won't be required anymore. We may also need to define additional algorithms for accessing the tensor data, such as getValueByIndex, setValueByIndex, getValueByLocation and setValueByLocation used in above element-wise binary and other op implementations. For example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/lib/tensor.jsLoop in @wchao1115 for comments.
IIUC, the op abstraction sounds like to implement WebNN op without relying on any (native) ML APIs.
Correct. We would expect that real implementations would optimize heavily using platform APIs, parallel instructions, hardware accelerators, etc etc. Using the JS code to help craft the algorithm implementations seems like a great idea!
This would add a few more things into your addition op algorithm to make it element-wise addition,
To be clear, I was really just showing the style I'd follow rather than giving a concrete example for inclusion in the spec. Tensors etc. definitely need to be used rather than just primitive inputs.
Speaking of style, I wrote:
- Let a be the first input value.
- Let b be the second input value.
But we could use a style introducing the algorithm to simplify this, e.g. "takes two inputs (a and b, both numbers)".
Restating this issue:
The processing model for graph execution should be made more rigorous. At an extreme, it would look like pseudocode for an implementation of compute().
The behavior of each operator should be explained rigorously; even for basic math ops there are cases to consider around overflow/underflow. I don't think we want to go as far as pseudocode for each, but in some cases that might be appropriate. Some example issues that fall into this second category:
And commentary: although spec purity is great, I don't think this is particularly high priority; tackling the second category of issues on a case-by-case basis as needed to resolve interop issues seems fine.
The spec currently uses algorithms of the form (just examples copy/pasted from different places
Ideally this would all be defined. Here's one way to approach it, imagining a much simpler interface:
First we define the abstract notion of an op:
An op represents a mathematical operation as part of a graph. An op has zero or more inputs (ops) and one output (an op), and an algorithm which, when invoked, takes a corresponding number of input values and produces an output value.
In a dedicated section of the spec, we'd rigorously define each op's algorithm:
An addition op is an op with two inputs (both numbers) and the following algorithm:
An average op is an op with one input (a list) and the following algorithm:
Obviously these are overly simple, but in the spec right now the more complex ops are currently only defined by links. I'd tackle this incrementally, but it should be in the spec itself! See Web Audio for examples e.g. https://webaudio.github.io/web-audio-api/#fft-windowing-and-smoothing-over-time. These could be inline with the builder methods or as a separate section of the spec.
And then in a builder method for
add()
it'd be something like:That's probably not quite correct but hopefully the intention is clear.
More notes:
Okay, that's a lot of rambling and as noted I may be missing important details, so I'll stop here. Let's iterate (here or a PR?) until we come up with a good approach that balances readability, maintainability, and precision.
cc: @huningxin