graph-building syntax simpler for web developers

webmachinelearning / webnn

🧠 Web Neural Network API

https://www.w3.org/TR/webnn/

Other

376 stars 46 forks source link

graph-building syntax simpler for web developers #16

Closed gregwhitworth closed 5 years ago

gregwhitworth commented 5 years ago

I think it is useful to make the (graph-building) syntax simpler for the user. In fact, I think we can make it look almost exactly like an API for executing the ops directly (and the "eager" API), at least for the non-control-flow ops, as explained below.

(1) In the proposal, operands are represented by integers, implicitly associated with the order in which "addOperand" is called. It would be cleaner to have "addOperand" return an object/value that represents the operand. So, this should allow us to replace

let tensor1 = operandIndex++;
model.addOperand(float32TensorType);

let tensor1 = model.addOperand(float32TensorType);

(2) Instead of separately creating operands to represent the outputs of an op and separately adding the op (connecting the inputs and outputs up), it is better to have addOperation create the operands representing the outputs and returning them. This would allow us to replace

// intermediateOutput0 is the output of the first ADD operation.
let intermediateOutput0 = operandIndex++;
model.addOperand(float32TensorType);
// Add the first ADD operation.
model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone], [intermediateOutput0]);

// Add the first ADD operation.
// intermediateOutput0 is the output of the first ADD operation.
let intermediateOutput0 = model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone]);

I omitted the type float32TensorType since it can be inferred from the operands. But where necessary, we can add the type as an extra parameter. This approach can still support operations that return multiple outputs.

(3) Furthermore, instead of representing operations by a constant like nn.ADD, it seems better to encapsulate them as a method ADD, allowing us to simplify

let intermediateOutput0 = model.addOperation(nn.ADD, [tensor0, tensor1, fusedActivationFuncNone]);

let intermediateOutput0 = model.ADD([tensor0, tensor1, fusedActivationFuncNone]);

(4) Similarly, for constants, instead of separately creating an operand and then setting its value, we can use a single method to create a constant that does both. This would allow us to simplify:

// Add the operand for the NONE activation function, and set its value to FUSED_NONE.
let fusedActivationFuncNone = operandIndex++;
model.addOperand(scalarInt32Type);
model.setOperandValue(fusedActivationFuncNone, newInt32Array([nn.FUSED_NONE]));

to:

// Add the operand for the NONE activation function, and set its value to FUSED_NONE.
let fusedActivationFuncNone = model.Constant(scalarInt32Type, newInt32Array([nn.FUSED_NONE]));

(As discussed earlier, the type can be omitted if it can be inferred.)

Originally posted by @gramalingam in https://github.com/webmachinelearning/webnn/pull/15#issuecomment-486310862

huningxin commented 5 years ago

@gramalingam wrote: (1) In the proposal, operands are represented by integers, implicitly associated with the order in which "addOperand" is called. It would be cleaner to have "addOperand" return an object/value that represents the operand.

+1. How about change addOperand as

long addOperand(OperandOptions options);

(2) Instead of separately creating operands to represent the outputs of an op and separately adding the op (connecting the inputs and outputs up), it is better to have addOperation create the operands representing the outputs and returning them.

I agree it can simplify the user JS code, however there are two aspects we may take care of:

I omitted the type float32TensorType since it can be inferred from the operands. But where necessary, we can add the type as an extra parameter.

Some native APIs require the output tensors' shape when add/create an operation. For example, ANeuralNetworksModel_addOperation of NNAPI, CreateOperator of DirectML and MPSCNNKernel.encodeToCommandBuffer of MPS. For those APIs, if JS code provides the output tensors description, browser can easily use them. Otherwise, it requires browser to implement output tensor shape inference code.

This approach can still support operations that return multiple outputs.

The outputs list would useful for optional output tensors of an operation, for example BatchNormalization of ONNX.

(3) Furthermore, instead of representing operations by a constant like nn.ADD, it seems better to encapsulate them as a method ADD

Besides the addOperation, operation type constant/enum may be useful for API that queries supported ops for a platform/backend. For custom ops use case, a framework may query the supported ops by WebNN and offload the sub-graph that only contains supported ops. If the operation types are represented by constants or enums, WebNN can easily return a list of enums for supported operations. Otherwise, WebNN may need to add isSupported API for each operation.

(4) Similarly, for constants, instead of separately creating an operand and then setting its value, we can use a single method to create a constant that does both.

+1. How about remove setOperandValue and add addConstant as:

long addConstant(OperandOptions options, ArrayBufferView data);

gramalingam commented 5 years ago

Let me expand on the point mentioned above about making the graph-building syntax similar to an eager API. From the other discussions, there seems to be an interest in exploring Eager API (and JIT) in the future. A similarity in the syntax (between graph-building and eager-mode) will make it relatively easier for a user to change code using one mode into the other. (Conceptually, the two modes can be seen as different implementations of (almost) the same interface.)

Thus, where an Eager API interface has a method like:

Tensor ADD(sequence<Tensor>)

the builder API has a method like:

Operand ADD(sequence<Operand>)

As a result, a line of code like:

const z = M.ADD([x, y])

executes either in the graph-builder mode or eager mode depending on the type of M.

gramalingam commented 5 years ago

Re. @huningxin 's reply about changing addOperand to:

long  addOperand(OperandOptions options);

Yes to adding a return-value. I think it would be helpful to replace the use of “long” to represent operands by an abstract type (interface), say “Operand”, and make “addOperand” return a value of this type. An operand has multiple attributes (type, shape, etc.) and will likely be implemented as an object of some type/class. Using a long in the interface would restrict implementation choices (e.g., forcing an array or something similar to map the index to an object) and could also lead to potential misuse if users assume some implementation details (such as operands being numbered from 0 consecutively etc.)

gramalingam commented 5 years ago

@huningxin : (a) Should WebML force a developer to specify the output shapes, when they can be inferred (in principle)? This is a design-goal question. My opinion is that it is inconvenient for developers to specify it (except in occasional cases where it is required). This is the approach we see in higher-level APIs that developers use (in frameworks like PyTorch or Tensorflow) and this is the approach in the ONNX design as well. (b) With regards to the "sequence operands" used in the API: looking at NNAPI, I assume that this sequence is, in fact, a mix of "tensor operands" created by "addOperand" as well as numeric constants (e.g., used to indicate stride/padding for convolution, etc.)? I think it is useful to be able to differentiate between the two, to avoid potential confusion/errors. (c) With respect to "sequence dimensions" in "OperandOptions": most frameworks allow a representation of "unknown dimension" (e.g., ONNX allows a string like "N" to represent an unknown symbolic dimension, while other frameworks allow the use of negative numbers to represent unknown dimensions). We may need this ability. (d) Re. "optional outputs": these are rare (and can actually be eliminated, since this is mostly an optimization issue: a backend implementation that knows the full computation graph can figure out if some outputs are unused and optimize them). If necessary, these can be supported via optional parameters, without burdening the common-case where the outputs are non-optional.

huningxin commented 5 years ago

@gramalingam

(a) Should WebML force a developer to specify the output shapes, when they can be inferred (in principle)? (c) With respect to "sequence dimensions" in "OperandOptions": most frameworks allow a representation of "unknown dimension"

(I group point (a) and (c) together as they seem to be closely related to me.)

IMO, WebNN API should not force developers to specify the output shapes when they can be inferred. For "unknown dimension", developers could create a OperandOptions without specifying the dimensions when adding an operand by addOperand.

(b) With regards to the "sequence operands" used in the API: looking at NNAPI, I assume that this sequence is, in fact, a mix of "tensor operands" created by "addOperand" as well as numeric constants (e.g., used to indicate stride/padding for convolution, etc.)? I think it is useful to be able to differentiate between the two, to avoid potential confusion/errors.

I suppose the idea of current design (mix of tensor operands and attributes for an operation) is to minimize the interfaces exposed.

However, I would agree it is more friendly to developers by differentiating them. With that, we can add those attributes by individual parameters or grouped one. For example,

Operand Conv(sequence<Operand> inputs,
             sequence<long> paddings,
             sequence<long> strides,
             sequence<long> dilations,
             optional FusedActivation activation = 'none');

dictionary ConvAttributes {
    sequence<long> paddings;
    sequence<long> strides;
    sequence<long> dilations;
    FusedActivation? activation = 'none';
};

Operand Conv(sequence<Operand> inputs, ConvAttributes attributes);

(d) Re. "optional outputs": If necessary, these can be supported via optional parameters, without burdening the common-case where the outputs are non-optional.

Sounds good to me.

gramalingam commented 5 years ago

Regarding:

For "unknown dimension", developers could create a OperandOptions without specifying the dimensions when adding an operand by addOperand.

That works when we know all the dimensions or none of the dimensions. But it doesn't allow us to specify some dimensions, without others. E.g., if we know that some input is of dimension N x 100 x 100 where N is an unknown dimension. This seems to be common from what I have seen. How about if we use negative integers in the dimension list for an unknown dimension?

With that, we can add those attributes by individual parameters or grouped one.

The two suggested options look good. I think the dictionary approach may be a bit more convenient.

huningxin commented 5 years ago

Regarding to 6 June Teleconference, I'd like to summarize the API improvement areas and proposals.

OperandOptions

Improvement areas:

constants for operand type.
no support of unknown dimension.

Proposals:

rename OperandOptions with OperandDescriptor that actually describes a operand.
replace operand type constants with enum OperandType.
allow to use negative integers in the dimension list for an unknown dimension

addOperand

Improvement areas:

addOperand requires user code to maintain the operand index by an integer by itself.
addOperand has multiple usages which would confuse developer. Its usages include:
1. op/graph's inputs
2. constants by being followed by a setOperandValue
3. op's outputs
4. op's attributes

Proposals:

introduce Operand interface and use Operand object as return value.
replace addOperand with addInput and addConstant.
introduce Operand addInput(OperandDescriptor desc) for graph's inputs.
introduce Operand addConstant(OperandDescriptor desc, ArrayBufferView data) for constants.
remove the requirement of specifying op's output operand (see addOperation proposal).
replace usage of operand for op's attributes by dictionaries (e.g. ConvAttributes).
rename identifyInputsAndOutputs with identifyOutputs as inputs could be identified by addInput.

addOperation

Improvement areas:

addOperation requires to specify the shape of outputs that can be inferred.
addOperation requires to specify operation type by constant.
addOperation requires to provide op's attributes by adding operand.

Proposals:

return Operand or sequence<Operand> when adding an operation
replace addOperation and operation type constants with add<OperationName> method for each operation
add Operand addAddition(Operand a, Operand b) and Operand addMultiply(Operand a, Operand b) as start and rewrite example with the new API

The example would look like:

// Use tensors in 4 dimensions.
const TENSOR_DIMS = [2, 2, 2, 2];
const TENSOR_SIZE = 16;

// Create a Model object.
const model = await nn.createModel();

// Add inputs and constants
const float32TensorDesc = {type: 'float32', dimensions: TENSOR_DIMS};
const tensor0 = model.addConstant(float32TensorDesc, new Float32Array(arrayBuffer, 0, TENSOR_SIZE));
const tensor1 = model.addInput(float32TensorDesc);
const tensor2 = model.addConstant(float32TensorDesc, new Float32Array(arrayBuffer,
                                                                      TENSOR_SIZE * Float32Array.BYTES_PER_ELEMENT,
                                                                      TENSOR_SIZE));
const tensor3 = model.addInput(float32TensorDesc);

// Add operations
const intermediateOutput0 = model.addAddition(tensor0, tensor1);
const intermediateOutput1 = model.addAddition(tensor2, tensor3);
const output = model.addMultiply(intermediateOutput0, intermediateOutput1);

// Identify graph output
model.identifyOutputs([output]);

// Finish building the model.
await model.finish();

@gramalingam and others, please take a look. if no big concern, I am going to make a PR.

gramalingam commented 5 years ago

Hi @huningxin : thanks for the proposal. It looks good to me. (I have only a minor naming suggestion: drop the prefix "add" … and just use names like model.Constant, model.Input, model.Add, model.Multiply … but that is mostly a matter of personal-taste. It may help make the eager-API and graph-building API similar … in case we ever add an eager-execution API also.)

huningxin commented 5 years ago

Thanks @gramalingam .

Regarding to your comments:

drop the prefix "add" … and just use names like model.Constant, model.Input, model.Add, model.Multiply

We may need to follow lowerCamelCase for method name. So if we remove "add" prefix, these methods would look like model.constant, model.input, model.add and model.multiply.

It may help make the eager-API and graph-building API similar … in case we ever add an eager-execution API also.

This is a good point. For this purpose, we may step forward to decouple the operations from Model interface and add them into NeuralNetworkContext interface for both usages. Then an operation method can either build graph node or compute depending on the input parameters. When parameters are Operand, it builds the graph node. When parameters are Tensor (to be defined), it computes. For example (similar to your above example): To compute the add operation eagerly:

// Create Tensor a.
const a = nn.tensor(descriptor, value);
// Create Tensor b.
const b = nn.tensor(descriptor, value);
// Invoke `Tensor add(Tensor a, Tensor b)` that computes the result and save to tensor c.
const c = nn.add(a, b);

To build a add node for graph execution:

// Create Operand a.
const a = nn.input(descriptor);
// Create Operand b.
const b = nn.constant(descriptor, buffer);
// Invoke `Operand add(Operand a, Operand b)` that builds the graph node and returns Operand c.
const c = nn.add(a, b);

After building and wiring graph nodes, user code can create a model by identifying the output operands. With this proposal, the example would look like:

// Use tensors in 4 dimensions.
const TENSOR_DIMS = [2, 2, 2, 2];
const TENSOR_SIZE = 16;

// Create input and constant operands.
const float32TensorDesc = {type: 'float32', dimensions: TENSOR_DIMS};
const tensor0 = nn.constant(float32TensorDesc, new Float32Array(arrayBuffer, 0, TENSOR_SIZE));
const tensor1 = nn.input(float32TensorDesc);
const tensor2 = nn.constant(float32TensorDesc, new Float32Array(arrayBuffer,
                                                                TENSOR_SIZE * Float32Array.BYTES_PER_ELEMENT,
                                                                TENSOR_SIZE));
const tensor3 = nn.input(float32TensorDesc);

// Create operations
const intermediateOutput0 = nn.add(tensor0, tensor1);
const intermediateOutput1 = nn.add(tensor2, tensor3);
const output = nn.mul(intermediateOutput0, intermediateOutput1);

// Create a Model object by identifying the output operands.
const model = nn.createModel([output]);

In this example, the code snippet can be reused for eager execution.

const intermediateOutput0 = nn.add(tensor0, tensor1);
const intermediateOutput1 = nn.add(tensor2, tensor3);
const output = nn.mul(intermediateOutput0, intermediateOutput1);

@nsthorat according to #12 , do you think this proposal is a future-proofing way for supporting eager execution later?

gramalingam commented 5 years ago

Hi @huningxin , yes, something like this would be good, I think.

One issue that complicates this (or at least is related to this discussion) is the notion of subgraphs or functions or scopes. In the long run, we may want to support control-flow constructs (if-then-else, loops). In this setting, we need to identify the scope into which we are adding an operation.

One suggestion is that the interface that exposes these methods ("add", "multiply") represents an object (which could be a scope into which we are adding the ops or it could be an eager-execution-mode scope). This object can make the choice on whether to build a graph or eagerly execute them.

I think we may not be able to unify the eager and graph-builder interfaces completely seamlessly, and there may be some differences. (E.g., eager expects only Tensors … but a builder can actually work with both Tensors and Operands). But there may still be value in making them similar, even if not identical.

Anyway: In the last meeting you made a good point that separating issues may be helpful (instead of trying to solve all issues together in one shot). So, we could worry about the above issues separately if you feel the same.

huningxin commented 5 years ago

As discussed in WebML CG Teleconference - 27 June 2019:

RESOLVED: Craft a spec PR based on https://‌github.com/‌webmachinelearning/‌webnn/‌issues/‌16#issuecomment-504305802 considering feedback on the call

I will follow up.