webmachinelearning / webnn

🧠 Web Neural Network API
https://www.w3.org/TR/webnn/
Other
376 stars 46 forks source link

Model Execution API #87

Closed pyu10055 closed 4 years ago

pyu10055 commented 4 years ago

There are couple questions on the existing execution API:

  1. The current model execution API requires users to provide output buffers before execution, this is not very convenient since this is an extra step for the user and user might not know the shape of the output before hand. Also, for many model this output shape is based on the input shape, it is an extra burden for users to find that out.

  2. The current execution is build on the compilation of the full graph, while the execution API does not prevent users to execution the sub-graph of the model, it is not clear why the pre-compilation is needed and should it be internal of the execution, so it can take care of sub-graph execution.

pyu10055 commented 4 years ago

@huningxin @wchao1115 Thanks for the explanation, it makes sense to have a dedicated compile step. For API point of view, I am not convinced that Compilation is a good encapsulation, while Model is a much concrete concept. In follow abstract example, User can use the compile method to cache the compilation, and call execute to take advantage of the caches.

class Model {
  compile() {
  }
  execute() {
  }
}

// they can do the compile explicitly
model.compile({'c': c}, {powerPreference: 'low-power'});
let results = await model.execute({'a': {buffer: bufferA, dimensions: shapeA}}, {'c': {}));
console.log(results.c.dimensions);
console.log(results.c.buffer);

// they can also call the execute, which will trigger compilation as needed.
results = await model.execute({'a': {buffer: bufferA, dimensions: shapeA}}, {'d': {}));
console.log(results.d.dimensions);
console.log(results.d.buffer);

The idea is we do not need to sacrifice ease-of-use over control. Thoughts?

huningxin commented 4 years ago

@wchao1115

I'm not familiar with record<K,V>. Is it supported in the w3c webidl spec? Can you point me to the standard definition? I'm not sure what the c in this expression results.c.dimensions stands for.

The link to record definition is https://heycam.github.io/webidl/#idl-record.

The c stands for the output named as c. With record, user can access it as a named property. The other syntax is indexing by name, e.g. results['c'].dimensions.

huningxin commented 4 years ago

@pyu10055

In follow abstract example, User can use the compile method to cache the compilation, and call execute to take advantage of the caches.

I have a few opens regarding to your proposal:

With today's spec, the users create a model by nn.createModel, such as

const model = await nn.createModel([{name: 'c', operand: c}]);

Then the users can call model.compile.

Actually, I propose to fold the model creation and compile into one nn.compile step, such as

-const model = await nn.createModel([{name: 'c', operand: c}]);
-const compilation = await model.compile({powerPreference: 'low-power'});
+const compilation = await nn.compile({'c': c}, {powerPreference: 'low-power'});

Model might be a more familiar concept to the developers. I am open to give Compilation a better name, such as ExecutableModel or CompiledModel. And do you also suggest change the compute method name to execute? Say

partial interface NeuralNetworkContext {
  Promise<ExecutableModel> compile(NamedOperands outputs, optional CompilationOptions options = {});
};

interface ExecutableModel {
  Promise<Outputs> execute(Inputs inputs, optional Outputs outputs);
};

With that, there would be only one change comparing to your sample code:

- model.compile({'c': c}, {powerPreference: 'low-power'});
+ const model = nn.compile({'c': c}, {powerPreference: 'low-power'});
  let results = await model.execute({'a': {buffer: bufferA, dimensions: shapeA}}, {'c': {}));
  console.log(results.c.dimensions);
  console.log(results.c.buffer);

WDYT?

pyu10055 commented 4 years ago

@huningxin first let me answer your questions:

  1. Should model.compile be an async method? It should be a async method to allow users to potentially time following operations.
  2. How would the users create a model? Why do the users need to supply the output operands to model.compile? basically the model is a warp of the topology with helper methods to facilitate compilation/execution/disposal/etc. Maybe it is the same concept as your Context class. The output operands for compilation is optional, but by specifying the output operands, users can compile subgraph. The output the compilation calls are stored within the Model.

My initial reaction to execution API is that the model topology is created using operands, but there is no clear ownership of those operands. The topology seems to live in a global context, and I may created multiple topologies and even share nodes across them. It is also possible to change the topology and not sure about how the changes would affect the compilation.

Model can act as a context where the topology is created within. And any compilations should also associate with that context. Topology change should invalidate the existing compilations. That is also the reason I see the benefit of separate compilation step, but think many users might not need be concerned with that.

huningxin commented 4 years ago

@pyu10055

but there is no clear ownership of those operands.

According to the current API, the operands are just used to describe the graph topology for model creation (by nn.createModel of today's spec) or compilation (by nn.compile of my new proposal). The operands are not associated with model or compiled model.

It is also possible to change the topology and not sure about how the changes would affect the compilation.

Given that, the topology change would not affect the existing compilations. For example

const a = nn.input('a', descA);
const b = nn.constant(descB, bufferB);
let c = nn.add(a, b);
const compilation1 = await nn.compile({'c': c});  // compile "c = a + b"

c = nn.mul(a, b);  // This would not affect compilation1.
const compilation2 = await nn.compile({'c': c});  // compile "c = a * b"

let results = await compilation1.compute({'a': {buffer: bufferA}});
console.log(results.c.buffer);  // results.c.buffer = bufferA + bufferB

results = await compilation2.compute({'a': {buffer: bufferA}});
console.log(results.c.buffer);  // results.c.buffer = bufferA * bufferB

Does it make sense?

wchao1115 commented 4 years ago

I think the point @pyu10055 is making is that the transient state of the topology being constructed seems to be living inside the nn context, which makes it hard for developers to manage those state separately from the context itself. Assuming we expose the notion of topology (or model) to manage them instead, we could do something like this:

const t1 = nn.createTopology();
const a = t1.input('a', descA);
const b = t1.constant('b', descB, bufferB);
const c = t1.add(a, b);
const t2 = nn.createTopology();
const x = t2.input('x', descX);
const y = t2.constant('y', descY, bufferY);
const z1 = t2.mul(x, b);  // z1 = x * b
const m1 = await t2.compile({'z1': z1});  // not allowed! cross topology operand detected.
const z2 = t2.mul(x, y);  // z2 = x * y
const m2 = await t2.compile({'z2': z2});  // ok!

By introducing the notion of topology in the API, it acts as an agent to the context that owns the transient state of the topology being constructed. This means t1 and t2 may be garbage-collected separately whenever it's no longer used without affecting the state of the context itself, which might be long-lived. Otherwise, we would need to have a clearTopology method exposed from the context to wipe off this transient state every now and then since you can never be sure what's left in there at any given time.

gramalingam commented 4 years ago

Yes, to the last point @wchao1115 makes. (I had earlier assumed we would create different nn context for different models.) Regardless of what we call it (model, topology, context), it is useful to have different instances for different models. However, we still need to specify what can be shared across different instances. E.g., I don't think nodes should be shared. They should belong to a unique model/topology.

Another advantage of having this model/topology/context be an object is that, in principle, it may be possible to have both an graph-builder implementation of the interface as well as an "eager" implementation of the interface that evaluates the ops as it is constructed (in the longer run). This is doable, at least in the absence of control-flow ops. But may be this is a digression/distraction.

wchao1115 commented 4 years ago

I think the topology idea has merit and should help with the transfer learning scenario where a topology maybe altered after it being created but before compile. I see that eager could be implemented as you describe -- as another implementation of the nn interface with the compile method being optional.

huningxin commented 4 years ago

@wchao1115

I think the point @pyu10055 is making is that the transient state of the topology being constructed seems to be living inside the nn context, which makes it hard for developers to manage those state separately from the context itself.

The topology is represented by the wired operands, e.g. c = nn.matmul(a, b). The nn.compile({'c', c}) method compiles the topology into the executable model. After that, these operands would not be referenced by either the model or the context. So they are able to be garbage-collected and the developers would not need to manage them explicitly.

@gramalingam

it is useful to have different instances for different models. However, we still need to specify what can be shared across different instances. E.g., I don't think nodes should be shared. They should belong to a unique model/topology.

Although the developers may reuse the same operands to describe the different topologies, these operands are not referenced or shared by the compiled models. For example

const a = nn.input('a', descA);
const b = nn.constant(descB, bufferB);
const c = nn.add(a, b);
const compilation1 = await nn.compile({'c': c});  // compile "c = a + b"

const d = nn.mul(a, b);
const compilation2 = await nn.compile({'d': d});  // compile "d = a * b"
// a, b, c and d can be garbage-collected.

Another advantage of having this model/topology/context be an object is that, in principle, it may be possible to have both an graph-builder implementation of the interface as well as an "eager" implementation of the interface that evaluates the ops as it is constructed (in the longer run).

It may be possible by allowing getting an "eager" context and reading the operand buffer within that context. This is based on the idea of @wchao1115 . For example:

const nn = navigator.ml.getNeuralNetworkContext('eager');
const a = nn.constant(descA, bufferA);
const b = nn.constant(descB, bufferB);
const c = nn.add(a, b);
await c.buffer();
wchao1115 commented 4 years ago

[Note: For the sake of this discussion, I purposefully borrow the word topology to try to differentiate it from the notion of model that is an immutable output of the existing createModel method to avoid confusions. I agree that createModel along with its output doesn't seem to serve much purpose and could be let go.]

As for the topology, its main advantage is to convey to the developers and implementers of the API that if they have states that are specific to a specific graph building session, they could store it here and leave all the global states that affect all graph building sessions in the context itself. I believe the addition of this abstraction could make the API more clear and more intuitive to all parties. I'm a bit concerned when we have to make statements like

these operands would not be referenced by either the model or the context.

as it seems to impose non-obvious implementation choices and constraints, which may not hold up in certain situation.

The issue of how to support eager has been brought up before in the past, and though they're relevant to the general design of the API, I'd rather have it tracked in a separate issue for the purpose of keeping this thread focused on the specific issue originally raised. In my view I think we generally agree that eager can be thought of as an implementation variant of the same context interface, and that at the moment the introduction of the notion of topology doesn't seem to prevent supporting eager execution in the future.

huningxin commented 4 years ago

@wchao1115 , thanks for the detailed explanation.

As for the topology, its main advantage is to convey to the developers and implementers of the API that if they have states that are specific to a specific graph building session, they could store it here and leave all the global states that affect all graph building sessions in the context itself.

I agree it makes sense to allow multiple graph building sessions. Now we only have a global one.

these operands would not be referenced by either the model or the context.

as it seems to impose non-obvious implementation choices and constraints, which may not hold up in certain situation.

I agree this should be implementation details.

The issue of how to support eager has been brought up before in the past, and though they're relevant to the general design of the API, I'd rather have it tracked in a separate issue for the purpose of keeping this thread focused on the specific issue originally raised.

+1

As @pyu10055 mentioned,

Model can act as a context where the topology is created within. And any compilations should also associate with that context.

Probably we could repurpose the Model interface for the graph building session instead of as an immutable one. So we can keep the Model interface and add the graph building methods into it. Its usage would be the same as Topology that @wchao1115 proposed. For example:

interface NeuralNetworkContext {
  Model createModel();
};

interface Model {
  Operand input(DOMString name, OperandDescriptor desc);
  Operand constant(OperandDescriptor desc, ArrayBufferView value);

  Operand add(Operand a, Operand b);
  Operand mul(Operand a, Operand b);
  // and other operations

  Promise<Compilation> compile(NamedOperands outputs,  optional CompilationOptions options = {});
};
wchao1115 commented 4 years ago

Thanks @huningxin. My only concern on the model naming is that it implies immutability to most people. What we're looking for here is a name for a mutable graph building state. I'm not fixed on the word topology but I think at least that wording does imply an arrangement or build-up of a graph as in topology: the way in which constituent parts are interrelated or arranged. Suggestion is welcome.

pyu10055 commented 4 years ago

@wchao1115 @huningxin I agree that topology shall usually be immutable after creation, but the concept of model is usually used for constructing topology. If you look at any of the model building API (keras for example), the topology is part of the model object, whether it is SequentialModel or FunctionalModel. In Ningxin's LeNet example, he is constructing a model. And if we are expecting support training in the future, having model concept is also useful.

wchao1115 commented 4 years ago

@pyu10055 I'm not sure I understand your comment. Are you making a case of calling it a model?

huningxin commented 4 years ago

@wchao1115 @pyu10055 we may consider to align to the model-loader API where a model should be immutable. @jbingham, please correct me if I am wrong.

So probably we could change the current NueralNetworkContext to ModelBuilder that would be the counterpart of ModelLoader. Developers would create a builder (through navigator.ml.createModelBuilder) for a specific graph building session. And we may still keep the Model interface as the product of the builder and loader.

The code sketch would be:

// building a model
const builder = navigator.ml.createModelBuilder();
const a = builder.input('a', descA);
const b = builder.constant(descB, bufferB);
const c = builder.matmul(a, b);
const m1 = builder.createModel({'c', c});

// loading a model
const loader = navigator.ml.createModelLoader();
const m2 = await loader.load(modelUrl);
jbingham commented 4 years ago

Makes sense!

On Sat, Sep 26, 2020 at 6:08 PM Ningxin Hu notifications@github.com wrote:

@wchao1115 https://github.com/wchao1115 @pyu10055 https://github.com/pyu10055 we may consider to align to the model-loader https://webmachinelearning.github.io/model-loader/ API where model should be immutable. @jbingham https://github.com/jbingham, please correct me if I am wrong.

So probably we can change the current NueralNetworkContext to ModelBuilder that would be the counterpart of ModelLoader. Developers would create a builder (through navigator.ml.createModelBuilder) for a specific graph building session. And we may still keep the Model interface as the product of the builder and loader.

The code sketch would be:

// building a modelconst builder = navigator.ml.createModelBuilder();const a = builder.input('a', descA);const b = builder.constant(descB, bufferB);const c = builder.matmul(a, b);const m1 = builder.createModel({'c', c}); // loading a modelconst loader = navigator.ml.createModelLoader();const m2 = await loader.load(modelUrl);

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/webmachinelearning/webnn/issues/87#issuecomment-699568198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEJPKKGBW5IY5Z5FJJIPFDSH2F7LANCNFSM4QONBLTA .

gramalingam commented 4 years ago

@huningxin : the ModelBuilder idea makes sense. As for loading a model, why can't it be just:

// loading a model
const m2 = await navigator.ml.load(modelUrl);

If the indirection serves a purpose, that's fine with me.

zolkis commented 4 years ago

const m2 = await navigator.ml.load(modelUrl)

This makes sense IMHO, perhaps even with an optional arg for options.

wchao1115 commented 4 years ago

Per PR #94