Closed gregwhitworth closed 5 years ago
@gramalingam wrote: (1) In the proposal, operands are represented by integers, implicitly associated with the order in which "addOperand" is called. It would be cleaner to have "addOperand" return an object/value that represents the operand.
+1. How about change addOperand
as
long addOperand(OperandOptions options);
(2) Instead of separately creating operands to represent the outputs of an op and separately adding the op (connecting the inputs and outputs up), it is better to have addOperation create the operands representing the outputs and returning them.
I agree it can simplify the user JS code, however there are two aspects we may take care of:
I omitted the type float32TensorType since it can be inferred from the operands. But where necessary, we can add the type as an extra parameter.
Some native APIs require the output tensors' shape when add/create an operation. For example, ANeuralNetworksModel_addOperation of NNAPI, CreateOperator of DirectML and MPSCNNKernel.encodeToCommandBuffer of MPS. For those APIs, if JS code provides the output tensors description, browser can easily use them. Otherwise, it requires browser to implement output tensor shape inference code.
This approach can still support operations that return multiple outputs.
The outputs list would useful for optional output tensors of an operation, for example BatchNormalization of ONNX.
(3) Furthermore, instead of representing operations by a constant like nn.ADD, it seems better to encapsulate them as a method ADD
Besides the addOperation
, operation type constant/enum may be useful for API that queries supported ops for a platform/backend. For custom ops use case, a framework may query the supported ops by WebNN and offload the sub-graph that only contains supported ops. If the operation types are represented by constants or enums, WebNN can easily return a list of enums for supported operations. Otherwise, WebNN may need to add isSupported
API for each operation.
(4) Similarly, for constants, instead of separately creating an operand and then setting its value, we can use a single method to create a constant that does both.
+1. How about remove setOperandValue
and add addConstant
as:
long addConstant(OperandOptions options, ArrayBufferView data);
Let me expand on the point mentioned above about making the graph-building syntax similar to an eager API. From the other discussions, there seems to be an interest in exploring Eager API (and JIT) in the future. A similarity in the syntax (between graph-building and eager-mode) will make it relatively easier for a user to change code using one mode into the other. (Conceptually, the two modes can be seen as different implementations of (almost) the same interface.)
Thus, where an Eager API interface has a method like:
Tensor ADD(sequence<Tensor>)
the builder API has a method like:
Operand ADD(sequence<Operand>)
As a result, a line of code like:
const z = M.ADD([x, y])
executes either in the graph-builder mode or eager mode depending on the type of M.
Re. @huningxin 's reply about changing addOperand to:
long addOperand(OperandOptions options);
Yes to adding a return-value. I think it would be helpful to replace the use of “long” to represent operands by an abstract type (interface), say “Operand”, and make “addOperand” return a value of this type. An operand has multiple attributes (type, shape, etc.) and will likely be implemented as an object of some type/class. Using a long in the interface would restrict implementation choices (e.g., forcing an array or something similar to map the index to an object) and could also lead to potential misuse if users assume some implementation details (such as operands being numbered from 0 consecutively etc.)
@huningxin :
(a) Should WebML force a developer to specify the output shapes, when they can be inferred (in principle)? This is a design-goal question. My opinion is that it is inconvenient for developers to specify it (except in occasional cases where it is required). This is the approach we see in higher-level APIs that developers use (in frameworks like PyTorch or Tensorflow) and this is the approach in the ONNX design as well.
(b) With regards to the "sequence
@gramalingam
(a) Should WebML force a developer to specify the output shapes, when they can be inferred (in principle)? (c) With respect to "sequence dimensions" in "OperandOptions": most frameworks allow a representation of "unknown dimension"
(I group point (a) and (c) together as they seem to be closely related to me.)
IMO, WebNN API should not force developers to specify the output shapes when they can be inferred. For "unknown dimension", developers could create a OperandOptions
without specifying the dimensions
when adding an operand by addOperand
.
(b) With regards to the "sequence operands" used in the API: looking at NNAPI, I assume that this sequence is, in fact, a mix of "tensor operands" created by "addOperand" as well as numeric constants (e.g., used to indicate stride/padding for convolution, etc.)? I think it is useful to be able to differentiate between the two, to avoid potential confusion/errors.
I suppose the idea of current design (mix of tensor operands and attributes for an operation) is to minimize the interfaces exposed.
However, I would agree it is more friendly to developers by differentiating them. With that, we can add those attributes by individual parameters or grouped one. For example,
Operand Conv(sequence<Operand> inputs,
sequence<long> paddings,
sequence<long> strides,
sequence<long> dilations,
optional FusedActivation activation = 'none');
or
dictionary ConvAttributes {
sequence<long> paddings;
sequence<long> strides;
sequence<long> dilations;
FusedActivation? activation = 'none';
};
Operand Conv(sequence<Operand> inputs, ConvAttributes attributes);
(d) Re. "optional outputs": If necessary, these can be supported via optional parameters, without burdening the common-case where the outputs are non-optional.
Sounds good to me.
Regarding:
For "unknown dimension", developers could create a OperandOptions without specifying the dimensions when adding an operand by addOperand.
That works when we know all the dimensions or none of the dimensions. But it doesn't allow us to specify some dimensions, without others. E.g., if we know that some input is of dimension N x 100 x 100 where N is an unknown dimension. This seems to be common from what I have seen. How about if we use negative integers in the dimension list for an unknown dimension?
With that, we can add those attributes by individual parameters or grouped one.
The two suggested options look good. I think the dictionary approach may be a bit more convenient.
Regarding to 6 June Teleconference, I'd like to summarize the API improvement areas and proposals.
Improvement areas:
Proposals:
OperandOptions
with OperandDescriptor
that actually describes a operand.enum OperandType
.Improvement areas:
addOperand
requires user code to maintain the operand index by an integer by itself.addOperand
has multiple usages which would confuse developer. Its usages include:
setOperandValue
Proposals:
Operand
interface and use Operand
object as return value.addOperand
with addInput
and addConstant
.Operand addInput(OperandDescriptor desc)
for graph's inputs.Operand addConstant(OperandDescriptor desc, ArrayBufferView data)
for constants.ConvAttributes
).identifyInputsAndOutputs
with identifyOutputs
as inputs could be identified by addInput
.Improvement areas:
addOperation
requires to specify the shape of outputs that can be inferred.addOperation
requires to specify operation type by constant.addOperation
requires to provide op's attributes by adding operand.Proposals:
Operand
or sequence<Operand>
when adding an operationaddOperation
and operation type constants with add<OperationName>
method for each operationOperand addAddition(Operand a, Operand b)
and Operand addMultiply(Operand a, Operand b)
as start and rewrite example with the new APIThe example would look like:
// Use tensors in 4 dimensions.
const TENSOR_DIMS = [2, 2, 2, 2];
const TENSOR_SIZE = 16;
// Create a Model object.
const model = await nn.createModel();
// Add inputs and constants
const float32TensorDesc = {type: 'float32', dimensions: TENSOR_DIMS};
const tensor0 = model.addConstant(float32TensorDesc, new Float32Array(arrayBuffer, 0, TENSOR_SIZE));
const tensor1 = model.addInput(float32TensorDesc);
const tensor2 = model.addConstant(float32TensorDesc, new Float32Array(arrayBuffer,
TENSOR_SIZE * Float32Array.BYTES_PER_ELEMENT,
TENSOR_SIZE));
const tensor3 = model.addInput(float32TensorDesc);
// Add operations
const intermediateOutput0 = model.addAddition(tensor0, tensor1);
const intermediateOutput1 = model.addAddition(tensor2, tensor3);
const output = model.addMultiply(intermediateOutput0, intermediateOutput1);
// Identify graph output
model.identifyOutputs([output]);
// Finish building the model.
await model.finish();
@gramalingam and others, please take a look. if no big concern, I am going to make a PR.
Hi @huningxin : thanks for the proposal. It looks good to me. (I have only a minor naming suggestion: drop the prefix "add" … and just use names like model.Constant, model.Input, model.Add, model.Multiply … but that is mostly a matter of personal-taste. It may help make the eager-API and graph-building API similar … in case we ever add an eager-execution API also.)
Thanks @gramalingam .
Regarding to your comments:
drop the prefix "add" … and just use names like model.Constant, model.Input, model.Add, model.Multiply
We may need to follow lowerCamelCase for method name. So if we remove "add" prefix, these methods would look like model.constant
, model.input
, model.add
and model.multiply
.
It may help make the eager-API and graph-building API similar … in case we ever add an eager-execution API also.
This is a good point. For this purpose, we may step forward to decouple the operations from Model
interface and add them into NeuralNetworkContext
interface for both usages. Then an operation method can either build graph node or compute depending on the input parameters. When parameters are Operand
, it builds the graph node. When parameters are Tensor
(to be defined), it computes. For example (similar to your above example):
To compute the add
operation eagerly:
// Create Tensor a.
const a = nn.tensor(descriptor, value);
// Create Tensor b.
const b = nn.tensor(descriptor, value);
// Invoke `Tensor add(Tensor a, Tensor b)` that computes the result and save to tensor c.
const c = nn.add(a, b);
To build a add
node for graph execution:
// Create Operand a.
const a = nn.input(descriptor);
// Create Operand b.
const b = nn.constant(descriptor, buffer);
// Invoke `Operand add(Operand a, Operand b)` that builds the graph node and returns Operand c.
const c = nn.add(a, b);
After building and wiring graph nodes, user code can create a model by identifying the output operands. With this proposal, the example would look like:
// Use tensors in 4 dimensions.
const TENSOR_DIMS = [2, 2, 2, 2];
const TENSOR_SIZE = 16;
// Create input and constant operands.
const float32TensorDesc = {type: 'float32', dimensions: TENSOR_DIMS};
const tensor0 = nn.constant(float32TensorDesc, new Float32Array(arrayBuffer, 0, TENSOR_SIZE));
const tensor1 = nn.input(float32TensorDesc);
const tensor2 = nn.constant(float32TensorDesc, new Float32Array(arrayBuffer,
TENSOR_SIZE * Float32Array.BYTES_PER_ELEMENT,
TENSOR_SIZE));
const tensor3 = nn.input(float32TensorDesc);
// Create operations
const intermediateOutput0 = nn.add(tensor0, tensor1);
const intermediateOutput1 = nn.add(tensor2, tensor3);
const output = nn.mul(intermediateOutput0, intermediateOutput1);
// Create a Model object by identifying the output operands.
const model = nn.createModel([output]);
In this example, the code snippet can be reused for eager execution.
const intermediateOutput0 = nn.add(tensor0, tensor1);
const intermediateOutput1 = nn.add(tensor2, tensor3);
const output = nn.mul(intermediateOutput0, intermediateOutput1);
@nsthorat according to #12 , do you think this proposal is a future-proofing way for supporting eager execution later?
Hi @huningxin , yes, something like this would be good, I think.
One issue that complicates this (or at least is related to this discussion) is the notion of subgraphs or functions or scopes. In the long run, we may want to support control-flow constructs (if-then-else, loops). In this setting, we need to identify the scope into which we are adding an operation.
One suggestion is that the interface that exposes these methods ("add", "multiply") represents an object (which could be a scope into which we are adding the ops or it could be an eager-execution-mode scope). This object can make the choice on whether to build a graph or eagerly execute them.
I think we may not be able to unify the eager and graph-builder interfaces completely seamlessly, and there may be some differences. (E.g., eager expects only Tensors … but a builder can actually work with both Tensors and Operands). But there may still be value in making them similar, even if not identical.
Anyway: In the last meeting you made a good point that separating issues may be helpful (instead of trying to solve all issues together in one shot). So, we could worry about the above issues separately if you feel the same.
As discussed in WebML CG Teleconference - 27 June 2019:
RESOLVED: Craft a spec PR based on https://‌github.com/‌webmachinelearning/‌webnn/‌issues/‌16#issuecomment-504305802 considering feedback on the call
I will follow up.
I think it is useful to make the (graph-building) syntax simpler for the user. In fact, I think we can make it look almost exactly like an API for executing the ops directly (and the "eager" API), at least for the non-control-flow ops, as explained below.
(1) In the proposal, operands are represented by integers, implicitly associated with the order in which "addOperand" is called. It would be cleaner to have "addOperand" return an object/value that represents the operand. So, this should allow us to replace
by
(2) Instead of separately creating operands to represent the outputs of an op and separately adding the op (connecting the inputs and outputs up), it is better to have addOperation create the operands representing the outputs and returning them. This would allow us to replace
by
I omitted the type float32TensorType since it can be inferred from the operands. But where necessary, we can add the type as an extra parameter. This approach can still support operations that return multiple outputs.
(3) Furthermore, instead of representing operations by a constant like nn.ADD, it seems better to encapsulate them as a method ADD, allowing us to simplify
to
(4) Similarly, for constants, instead of separately creating an operand and then setting its value, we can use a single method to create a constant that does both. This would allow us to simplify:
to:
(As discussed earlier, the type can be omitted if it can be inferred.)
Originally posted by @gramalingam in https://github.com/webmachinelearning/webnn/pull/15#issuecomment-486310862