webmachinelearning / webnn

🧠 Web Neural Network API

https://www.w3.org/TR/webnn/

Other

352 stars 45 forks source link

High level vs low level #3

Closed gregwhitworth closed 5 years ago

gregwhitworth commented 5 years ago

Hey everyone,

While reviewing this PR I had an issue but it's a horizontal issue that should be discussed outside of that PR, maybe a good agend item for our first telecon @anssiko. I have had very few folks desire a low level API for this based on the use cases I've been able to procure.

Additionally, I'm not sure that some of the items that are denoted as low-level in the linked commit I would define as low-level, rather that they are not using the pre-trained models exactly as is. In the discussions that I've had with many folks across Microsoft, the majority of client side use cases that people are looking at for production scenarios would not require a low level API. Effectively this comes back to the issue of who the customer is, a library author or web developer solving the problem. There are pros and cons to each approach we take and I don't know if we need to have a fundamentals line in the sand but I do think I'm seeing a larger desire for higher level APIs than a lower level one. Thoughts?

anssiko commented 5 years ago

@gregwhitworth, thanks for initiating this discussion! It would be very helpful if you could share some of the client side use cases you've procured.

I don't think we can yet confidently draw a line between low level and high level use cases for our purposes. We need more input. Our current charter explains the layering through the Extensible Web Manifesto principles. When the charter talks about low-level capabilities it attempts to refer to the EWM definition:

To enable libraries to do more, browser vendors should provide new low-level capabilities that expose the possibilities of the underlying platform as closely as possible.

As said, the use cases PR is an initial attempt and I hope the group will help refine it further. We probably learn more when we derive requirements out of the use cases too.

Maybe it should be noted, that for many high level use cases a library makes a lot of sense too. For example, face-api.js solves a high level use case, is implemented on top of the TensorFlow.js core API that itself could be a consumer of the Web Neural Network API. Out of scope for this group would be, however, to define a dedicated API for face detection since that's considered a high level use case (incubation is underway elsewhere as the Face Detection API). Borrowing the EWM terminology, the Face Detection API could be explained in terms of the low-level capabilities exposed by the Web Neural Network API.

This is exactly why I believe discussion around use cases (as opposed to jumping right to the "solution") will be time well spent as it forces us to put the needs of the consumer first, and identify who the (primary) consumer is.

You pointed out some low-level use cases are not using pre-trained models as is. That's good input and should be fed into the PR.

(Any clarifications or suggestions to the charter itself, please open an issue against the charter repo.)

gregwhitworth commented 5 years ago

We spoke about this on the telecon today, and I wanted to try and articulate my thoughts here a bit further. There are effectively three routes we can take with this API:

Operator
Load & Run Model
Solution Based

Non-normative definition of each:

Operator - Primary Customer [Framework Authors]

This would map more similiar to what is available in the Intel POC, and is somewhat similiar to what data scientists are used to utilzing in Tensorflow, PyTorch, WinML, etc. The current list of supported operators in the POC is currently here - but that does not mean that those are the only operators that need to be in the spec.

Pros:
- Scopes in the necessary operators that are required to support the usecases on the web.
- Allows it to be more extensible and handle more complex capabilities
Cons:
- Has a pretty steep learning curve
- For most web developers they'll have to include a library in order to utilize the technology

Load & Run Model - Primary Customer [Web Dev]

This would allow a web developer to load a model, and pass in inputs to be evaluated against the pre-trained model. A quick example of this would be something like this:

var evaluation = "some text"
var modelA = await loadModel(model.onnx);
var output = runModel(model.onnx, evaluation);
var modelB = await loadModel(modelB.onnx);
var answer = runModel(modelB, output);

Pros:
- Simple & intuitive
- Will still require scoping of operators so implementors can plumb through the necessary operators to support various model formats for hardware accelleration
- Will not require a JS library for model evaluation
Cons:
- Is not extensible without pre-training additional models as to affect the final output

Solution Based

This is taking the common use cases, as have been proposed in this PR and beginning to define JS APIs that solve those problems.

Pros:
- The most intuitive of the three
Cons:
- Is not extensible
- Would require standardization on the return results from the models which will limit the capability of the models

Path forward

There is nothing stopping us from doing all three, but I personally prefer for us to explore specifying and implementing the operator and load model approaches. This aligns with the charter of being "low level" and will enable JS libraries to build out the high level solutions while allowing the platform to cater to use cases for building a framework or for evaluating an input against a pre-trained model.

I welcome any feedback you all may have.

jonathanding commented 5 years ago

@gregwhitworth Thanks a lot for raising this topic and personally I also think that we need a set of APIs (temporarily called it as high level APIs) to cover probably typical requirements of web app developers, i.e. load and run the pre-trained model.

The major benefits of High Level APIs are:

Simple, intuitive to match the typical needs of web app developers (not framework developers)
API is Stable and less dependant to domain specific world (machine learning in this case). It doesn't need to update the specification from time to time to reflect the latest achievements in machine learning world, such as new operators, etc.
Faster to market. It has opportunity to land to browsers faster, by mapping the quite few apis to native stack. Even it isn't supported by browser yet, it is very easy to provide polyfills based on tensorflow.js and onnx.js etc.

High Level API for ML is about Model, similar as Video in HTML

A lot of high level usage (from web app developers) of models of machine learning, e.g. neural network models, are quite similar as media files such as video clips in Web i.e. <video>.

The high level APIs for ML would focus on Models, similar as API/Tag of processing videos. For ML model and a video clip in Web:

They both could be considered as a kind of Resource
Such Resource would have a set of high level JavaScript APIs, e.g.
- For video it is HTMLMediaElement which consists of methods like play() and pause().
- For ML model we could have something like keras.Model, with methods such as load(), compile(), predict(), fit() etc.
Such Resource could have different format. The actual format is domain specific thus is defined by various standard bodies other than W3C.e.g.
- For video it could be H.264 or AV1 etc.
- For ML model it could be tensorflow, ONNX, caffe etc.
JavaScript could be used to access the data of such Resource
- For video there are canvas related API (at this stage) to get the frame data.
- For models we even don't need standardized APIs. With the known schema of the models (e.g. Tensorflow / ONNX), we could have JavaScript libraries providing parsers, readers, writers to parse and even modify the model.

Necessity of High Level APIs for WebML

As you mentioned, typical machine learning usages in Web app need to be simple and intuitive

Typical Usages in Web app
- pre-trained model -> Load and inference in web app
Less Typical in Web app
- build, rather than load, the computation graph
- significantly modify the topology of a loaded model
- perform heavy training, rather than some lightweight fine tuning

A good design of High Level APIs could make 1st category (Typical Usages) much simpler and more elegant, while still support 2nd category (Less Typical).

Furthermore, high level apis could be easily hardware accelerated by almost directly mapping to underlying native software stack for machine learning, since they have model based APIs too.

Lastly, with high level APIs, we don't need to define many solution based APIs one by one (e.g. face detection, action recognition api, pose recognition api etc). They are simply models loaded and then inferenced.

Proposal

High level API could be defined by following wisdoms of <video> and keras.Model.

The model could be created from a URL in JavaScript, or defined by a HTML tag similar as <video>

<MLModel 
    id="ml-example" 
    source="a.com/a.model"
    preload="true" other-attrs="..." />

// Model is created in JavaScript from remote
let model_by_js = WebML.model.fromSource("a.com/a.model");

// Model is created in JavaScript from a memory buffer / array
let model_by_buffer = WebML.model.fromBuffer(...);

// Or defined in HTML 
let model_by_tag = document.getElementById("ml-example");

// preload already, so just use it
await model_by_tag.compile();
await model_by_tag.predict();
await model_by_tag.fit();

Other Considerations

Model Format - whether it need to be specified

This could be an open that whether the format should be specified. We have two options:

Specify a format for model in standard

For example, agree on using ONNX. Web app developers could either convert their non-ONNX models to ONNX offline. Community could also build some converters in JavaScript as libraries, so that these models could be dynamically converted inside web app, which are more friendly to web app developers.
Similar as HTML5 video, no restrictions in spec, but allow multiple ML engines (similar as codecs) registered into browser.

Developer may provide the same model but in different formats, browser use the first one that it supports.
```
<MLModel id="ml-example" pre-load="true">
    <source="a.com/model.tf" type="tensorflow"/>
    <source="a.com/model.onnx" type="onnx"/>
    <source="a.com/model.coreml" type="coreml"/>
</MLModel>
```

Query APIs

Similar as Media Query, there could be APIs to query ML capabilities of browser, e.g. what format (and version) it supports, how about the hardware acceleration etc.

Construct / Modify the Model with High Level APIs

As mentioned above, there are actually NO direct high level APIs to construct/modify a model. This is different vs. low level APIs which could directly compose the graph by adding operators etc.

However, with known schema (encoding format) of the model, one could still use pure JavaScript to directly construct/modify a model in raw format in memory buffer, and then load it from buffer to form a model for predicting / fitting. And it is expected that community would emerge JavaScript libraries to provide helpers (e.g. onnx writer etc.) to facilitate such internal manipulation on raw model.

gregwhitworth commented 5 years ago

@jonathanding thanks for response, I agree overall that we need a loadModel level and operator. I prefer to refrain from saying high and low level as I think both are providing web developers with what they want from the platform.

I do not think that we need to be focused on the formats that need to be supported at the present time, but the operators in which will be supported (we'll need to know this for both a loadModel and operator approach).

As we begin to identify formats that support these operators we can use them as examples and still leave this specification from defining which formats a UA needs to support as any format that supports the defined operators will be covered.

I am less inclined to support the notion of a HTML ML element, as unlike the media components we've been using for analogies the models do not inherently have a semantic meaning and will require JS to be of any use to the web developer and their users.

Thanks for the feedback, any other thoughts from others?

Kangz commented 5 years ago

In the spirit of the Extensible Web, I believe the API should be exclusively low-level and probably not even know about the concept of loading ML models.

There is no reason to expose a new HTML element because like @gregwhitworth pointed out, the model doesn't carry semantic value by itself and needs JS to be actually useful. Adding the concept of ML "codecs" to the Web platform won't make it easier for developers to do ML on the Web put will instead increase the complexity of their application because they will have to make choices. Video codecs are different because they represent actual hardware video decoding capabilities (as well as historical reasons).

The usability aspect of the loadModel approach is very compelling: in Web development today this type of functionality can be added via a single npm command. There are real drawbacks in that "operators" are needed for existing frameworks anyway and loadModel would add even more complexity to the Web platform on top of them. I don't know the market too well, but there doesn't seem to be a single agreed upon file format (coreml vs. TF vs. ONNX vs. ...) so choosing one of them will be a difficult political decision. There is also the issue of upgrading to newer version of the format and some versions only being supported on some browsers. Also debuggability.

DanielMazurkiewicz commented 5 years ago

Since on Thursday will be an online meeting want to point here another proposal for the "low level" API to discuss: https://github.com/DanielMazurkiewicz/WebAI

RafaelCintron commented 5 years ago

I think we should concentrate on inferencing scenarios for v1 of the API. Training can come later.

I agree with @Kangz and @gregwhitworth that an HTML element is not necessary. The API can be a pure Javascript API like WebGL and WebAudio are today.

Both a loadModel-style (high level) API and operators-style (low level) API require "political decisions" on the part of the group. Even an operators-style API requires that we choose which operator definitions to align with. Different frameworks have different definitions.

As we evolve the API over time, there will need to be a way to query the API for the supported version. In case of a model-style API, this will be the model version. For an operator-style API, this will be and operator set version.

For both API level types, aligning to an existing organization such as ONNX is an attractive proposition to me. ONNX has been designed to be a common interchange format between existing frameworks. Multiple companies collaborate on formats and operators: Amazon, Facebook, Microsoft, etc. All of the major IHVs are a member. There already exist open source converters to and from ONNX and Tensorflow as well as CoreML formats/operators.

tomoyukilabs commented 5 years ago

For operators-style API, I agree with @RafaelCintron that alignment with a common interchange format like ONNX seems a good approach. It is more practical if developers can create custom operators that have not been supported by ONNX yet, for example.

Also, I agree that a way to confirm compatible version will be necessary for both model-style and operator-style APIs. I guess a method to check available formats will be needed as well if we want to allow a model-style API to support multiple formats.

DanielMazurkiewicz commented 5 years ago

Training can come later.

True, but backpropagation training algorithm is very well established algorithm and I don't see a single point why shouldn't it be implemented in first version of standard. Even if we will not make it to first version of standard I would still opt for making clear statement about how training api will look like - this will let us to avoid making shitty standards because of we didn't think of something and then have to adjust to what we just have.

For both API level types, aligning to an existing organization such as ONNX is an attractive proposition

Aldo there is plenty of more or less open standards, there is no clear winner and there is no clear leader (as it is usually with emerging technologies). So I'm totally against favoring any of existing standards in terms of using as web/javascript standard. We have a chance to make something good here, something that fits well javascript world. And I want to really opt to make it simple and designed for web.

Someone here metioned WebGL as example. WebGL is sort of good example of adopting existing non JS standard (openGL) that no one really uses directly :-) - tons of 3rd party libraries are needed in order to be able to work with it conveniently. It is nice to have super low level api access, but in terms of web would be much nicer to have something like Three.js API or a-frame built into a browsers and optimized and running at native speeds. Don't you agree?

It is also pity that no one before mentioned mature ML library made entirely in Javascript as something to start with for the web standard. :-) Have you heard of or seen brain.js library? This is really made by javascript guys for javascript world! :-) Maybe it isn't perfect, but as standardization group we don't have to copy 1 to 1 existing solutions, we can take best from them and compile into a standard. Look how simple and flexible is proposal API I made based on brain.js: API proposal

Also want to aware you that there is ongoing standardization project called webgpu with one of aims of providing access to gpu vulkan computations and spir-v, so keeping that in mind I wouldn't go too low with providing any sort of low level computation API in ML standard, it will be simply wasting and doubling work in our efforts, I would really focus on providing simple, nice, extensible ML API for well proven algorithms.

Cheers!

RafaelCintron commented 5 years ago

True, but backpropagation training algorithm is very well established algorithm and I don't see a single point why shouldn't it be implemented in first version of standard. Even if we will not make it to first version of standard I would still opt for making clear statement about how training api will look like.

The majority of the people we've spoken with about client side ML APIs are interested in inferencing. For this reason, I would prefer that we leave training out of the initial version. I agree anything we design for v1 should keep training in mind so we don't paint ourselves into a corner when it comes time to tackle it.

Aldo there is plenty of more or less open standards, there is no clear winner and there is no clear leader (as it is usually with emerging technologies). So I'm totally against favoring any of existing standards in terms of using as web/javascript standard. We have a chance to make something good here, something that fits well javascript world.

To be clear, I am proposing that we adopt ONNX for its model formats and operator definitions.

I agree the WebML API this group defines should be a first class JavaScript API.

Also want to aware you that there is ongoing standardization project called webgpu with one of aims of providing access to gpu vulkan computations and spir-v, so keeping that in mind I wouldn't go too low with providing any sort of low level computation API in ML standard, it will be simply wasting and doubling work in our efforts, I would really focus on providing simple, nice, extensible ML API for well proven algorithms.

WebGPU is slated to support compute shaders. However, compute shaders is one avenue to implement operators. There are additional platform APIs such as DirectML (on Windows) and others on Apple and Google platforms that will not be surfaced by WebGPU. The WebML API would make those additional capabilities available to web developers.

On devices where specialized hardware is not available, the user agent can run the web developer's graph by falling back to either compute shaders or a CPU backend as appropriate.

DanielMazurkiewicz commented 5 years ago

The majority of the people we've spoken with about client side ML APIs are interested in inferencing. For this reason, I would prefer that we leave training out of the initial version.

I guess we were talking just to different people, :-) because majority of people I've been talking to would be pleased with training capabilities that for example would let them adjust NNs "on the fly" (in gaming industry). And frankly saying - have you ever coded back-propagation training algorithm? This is really not a "rocket science", also plenty of open source implementations laying around.

To be clear, I am proposing that we adopt ONNX for its model formats and operator definitions.

To be clear - I'm saying that we don't need full set of operator definitions now, only basic, most common types of neurons/layers and nice, flexible, JSON based, not over complicated, model format that can be easily maintained by dev's and safely (no JS code inside models) exchanged between clients, servers, machines...

And to be clear again here - I'm not against of providing ultra low level api for ML, I think it could be added somewhere in future versions of standards and only after some experience with base version of API (it might simply end up that base ML api is just fine for 99.9% of use cases).

I think covering all these ONNX operators in web ML standard is pointless and unnecessary over complicating web ML at this moment.

On devices where specialized hardware is not available, the user agent can run the web developer's graph by falling back to either compute shaders or a CPU backend as appropriate.

I think every computation device should be for web ml first class device for running ML. The only sort of "falling back" could be only from and to sort of "simulation modes". Check "WebAI.getCapabilities" method from my API proposal where you can get all (or selected by required capabilities) computation devices capable of running NN and then decide on which of them to run NN.

If you take into consideration that modern computer systems, mobile processors, SoC's can contain multiple GPU, CPU, DSP (including all sort of "vector units" or "ML units") falling back just from GPU (or other "dedicated" device) to CPU makes no sense, when you can use them all simultaneously. (btw. there are even USB sticks offering computation power, so why not allowing to use them as well?)

Going further - if we will stick to simple JSON model format (and not mess with javascript inside models and not over complicate models) there is much, much higher chance that such a variety of computation hardware (and software computation libraries) will receive JSON model dedicated compilers/interpreters due to low implementation complexity. And that also means much higher chance for quick standard adoption.

Resuming: I really vote for sort of MVP standard that will do it's primary ML job

Cheers! :-)

PS. For the reference of what I mean by saying "simple JSON model format" - I made some examples in my API proposal.

anssiko commented 5 years ago

A word on scoping and coordination with my chair hat on:

Scope of Work section of our charter defines the group's technical scope its participants have agreed to upon joining.
Out of Scope section currently says: "Training capabilities are out of scope due to limited availability of respective platform APIs." Training API is a possible candidate for inclusion into a future "v2" API and we keep training in mind while designing the "v1" API focused on inferencing.
Coordination section notes: "The WebML Community Group should coordinate with [WebGPU CG] to avoid overlap." We're lucky to have WebGPU participants in this group, including WebGPU CG co-chair @Kangz to help with that coordination.
Amendments to this Charter section defines an agreed upon process for changing the group's scope.

Thank you all for great proposals and productive discussion to date.

DanielMazurkiewicz commented 5 years ago

@anssiko Thank you for reminding out of scopes!

What are procedures to change scope and out of scopes? Because as far as I know community the very first lib/shim to web ML will be adding training feature :-) I think it really makes no sense to leave it behind (unless ONNX will be adapted as standard - then it will be really hard to implement it)

EDIT: I'm thinking it over - it is totally WRONG idea of providing ML without training capabilities, because it will make standard totally dependent on some existing solutions. Whoever proposed it - it was EVIL proposition :-)

nsthorat commented 5 years ago

Formalizing the discussion from the meetup from the Google side about high-level vs low-level.

TLDR: Our preference from TensorFlow is for this specification to focus on operations.

Exposing an accelerated operations API in JavaScript allows all of the machine learning libraries listed (TensorFlow.js, Brain.js, ONNX.js, etc) to have a shared target.

If WebML exposes functionality for executing models, it becomes opinionated with respect to the model format. Model formats are constantly changing, even just within TensorFlow there are many formats (TensorFlow 1.0 SavedModel format, 2.0 SavedModel format, keras format, older deprecated formats). The ONNX format is yet another format that is not TensorFlow compatible. Brain.js has its own model format. If we provide a model execution layer, one or multiple of these projects will suffer.

I will note that the benefit of exposing a model-level execution layer is for graph-rewriting and fusing of operations (typically happens by analyzing the graph). In TensorFlow.js, which has an eager-only execution engine, we're solving this problem by rewriting the graph offline (or using the graph in our high-level layers API) and calling into fused lower-level operations.

With WebML, we could come to a compromise by exposing these fused operations and allowing library-level rewriting to these fused ops.

For example, a fused matMul becomes:

fusedMatMul(a, b, transposeA, transposeB, bias?, activationFn?)

Overall, I think it makes more sense to target low-level primitives, and allow library authors to iterate on developer-friendly high-level APIs. 302 the extensible web manifesto :)

DanielMazurkiewicz commented 5 years ago

A little bit of clarification from here too...

I think we are all almost on the same page, the only difference is my proposal for phasing (and training ;-) ):

Simple base ML (aka brain.js) for quick standard shipment - lets call it a "super core" (with training ;-) )
Extended scientific ML with operators

In terms of custom operators, WebGPU should handle that it in my opinion.

For extending operators "Web ML Board" could be established and periodically add to the standard most common new operators.

If we provide a model execution layer, one or multiple of these projects will suffer.

Since NN are graphs, there is nothing against in being able to define operations at model level:

there is nothing against in providing different operations from different sources to the model format
there will be also benefits in not involving JS in executing particular operations of models
safety improvement
no need for 3rd party libs (which should be one of our goals I think!)

After all - sort of "operators domain" could be introduced to models and retrieved in ML capabilities object from ML API to check if given model can run on certain execution unit, but it seems for me a little going too far. Maybe it is something to discuss more in details? (I'm not against it)

So let me summarize levels of the web ML API I see here:

Super core level - basic ML operators, no fancy scientific operators, working only with JSON models
Base scientific - above + all fancy operators ;-)
Super scientific - Javascript API to access single operators programmatically

fusedMatMul(a, b, transposeA, transposeB, bias?, activationFn?)

It is a pretty good example of operation that could be easily converted to JSON object => aka ML model

Let me know if this would suit you and be sort of consensus for everyone.

Cheers!

DanielMazurkiewicz commented 5 years ago

FYI: Just added to my API example operations API and domains support

EDIT: @nsthorat @RafaelCintron check if this fulfills your needs, added operations and activation "domain" support and API for fully custom operations per domain (you're also able to define custom domains) - operations with same name could have different behaviors based on used domain. I think I covered now most of biggest issues if it comes to API, but let me know please if I didn't think of something important there.

RafaelCintron commented 5 years ago

@DanielMazurkiewicz wrote:

To be clear - I'm saying that we don't need full set of operator definitions now, only basic, most common types of neurons/layers and nice, flexible, JSON based, not over complicated, model format that can be easily maintained by dev's and safely (no JS code inside models) exchanged between clients, servers, machines...

Thank you for the clarification. I now understand your proposal better.

One draw back with this approach is that we (and others) will need to maintain tooling that is integrated with popular ML frameworks to support our custom model format. If the list of operators is too small or the operator definitions differ substantially from established ones, data scientists will have trouble exporting existing models and will need to author special case models for WebML. The nice thing about using an already existing model format and operator definition (especially one built with format interchange in mind like ONNX) is that this problem goes away.

The ONNX group spent considerable time coming up with an initial set of operator for ONNX. They spent even more time to mature and refine the initial set as they wrote converters for different frameworks and collaborated across multiple hardware vendors.

This community group should not need to create a web specific operator standard as the ML models used in the browser have strong parallels to those used elsewhere.

WebAI.getCapabilities

For fingerprinting reasons, I think we should avoid exposing too much information about the end-user's hardware capabilities to web developers. Web developers should be able to provide hints but, in my opinion, it should be up to the browser to decide how best to use the user's hardware to run the graph.

@nsthorat wrote:

I will note that the benefit of exposing a model-level execution layer is for graph-rewriting and fusing of operations (typically happens by analyzing the graph). In TensorFlow.js, which has an eager-only execution engine, we're solving this problem by rewriting the graph offline (or using the graph in our high-level layers API) and calling into fused lower-level operations.

An operator-style API should allow web developers to build their graph using Javascript function calls. Once the graph is in place, graph-rewriting and fusing of operations like you describe could be possible by the user agent. Am I missing something or do you have a different notion of what an operator-style API entails than I do?

DanielMazurkiewicz commented 5 years ago

@RafaelCintron wrote:

One draw back with this approach is that we (and others) will need to maintain tooling that is integrated with popular ML frameworks to support our custom model format. If the list of operators is too small or the operator definitions differ substantially from established ones, data scientists will have trouble exporting existing models and will need to author special case models for WebML. The nice thing about using an already existing model format and operator definition (especially one built with format interchange in mind like ONNX) is that this problem goes away.

This community group should not need to create a web specific operator standard as the ML models used in the browser have strong parallels to those used elsewhere.

In charters phasing issue I've addressed these problems, check it please and let me know if it is at least sort of satisfying solution: https://github.com/webmachinelearning/charter/issues/5

With having operators domains and possibility to extend domains and list of operators in API it should be really easy to make any ML format (including ONNX) easily convertible to JSON. And this also seems to me like a good compromise candidate for everyone.

WebAI.getCapabilities

For fingerprinting reasons, I think we should avoid exposing too much information about the end-user's hardware capabilities to web developers. Web developers should be able to provide hints but, in my opinion, it should be up to the browser to decide how best to use the user's hardware to run the graph.

That is a good point. When I designed this method in API proposal I kept in mind that user might want to be sure that certain models will be executed on certain hardware (for example: running some models might make no sense on CPU because CPU is simply too slow). I'll remove this method from API proposal and propose some different approach instead.

EDIT: Just had a second thought on getCapabilities, will keep it - basic capabilities like supported data types and domains could be useful for loading appropriate models - just will remove info about hardware.

anssiko commented 5 years ago

Summarizing the discussion in this issue to date (corrections welcome!):

@nsthorat (Google) prefers a primitive level API with fused operations, eager execution
@RafaelCintron (Microsoft) prefers a graph level API, lazy mode
@DanielMazurkiewicz (Brain.js) prefers to see training in scope; preliminary feedback from participants suggests the group is not yet ready to expand its scope

DanielMazurkiewicz commented 5 years ago

@DanielMazurkiewicz (Brain.js) prefers to see training in scope; preliminary feedback from participants suggests the group is not yet ready to expand its scope

If we are here to discuss things then I would like to know arguments why not from everyone who is against. Basic training functionality is about 67 LOC . It would be really pity (or bad will) not to include it without any significant reason.

@nsthorat (Google) prefers a primitive level API with fused operations, eager execution @RafaelCintron (Microsoft) prefers a graph level API, lazy mode

Spent on API proposal quite some of my private time to make it suitable for all. Still not a final proposal(latest updates from today), but at the same time I think it already meets above conditions and if not, then would be glad to know why and even more glad with improvements or adjustements proposals (like @RafaelCintron gave about fingerprinting - thanks!).

huningxin commented 5 years ago

Great discussions and summary! I'd like to share some investigations of native support based on our WebNN POC.

fused ops

@nsthorat wrote:

With WebML, we could come to a compromise by exposing these fused operations and allowing library-level rewriting to these fused ops.

Fused ops are supported by multiple OS APIs with hardware optimization. For example, our POC exposes the fused convolution op. On Android, it maps to ANEURALNETWORKS_CONV_2D of NNAPI that supports fused activation defined by FuseCode. On macOS, it maps to MPSCNNConvolution that supports fused activation by MPSCNNNeuron for GPU and BNNS convolution filter by setting BNNSActivation for CPU. According to DirectML slides, it will be available on Windows as well.

We also experimented the fused ops with real-world models, for example SqueezeNet from ONNX model zoo. The onnx model importer example can identify a convolution followed by a relu and generate a fused convolution op that is offloaded for native execution.

eager and graph execution

@anssiko wrote:

@nsthorat (Google) prefers a primitive level API with fused operations, eager execution

@RafaelCintron (Microsoft) prefers a graph level API, lazy mode

From use case perspective, I understand eager execution is more intuitive for model development and debugging, and graph execution has advantages on performance optimization and production deployment. We may target them for different usages at different phase. Please correct me if I am wrong @nsthorat @RafaelCintron .

So far, we've prototyped the graph execution in our POC. The graph maps to ANeuralNetworksModel of NNAPI and MPSNNGraph of MPS. According to the docs, the NNAPI and MPS enable graph level optimizations, like graph partition in NNAPI and graph rewriting in MPSNNGraph, that would be complementary to higher level frameworks' optimization. As BNNS only supports eager execution by BNNSFilterApply, we implemented the graph execution inside browser for BNNS.

native capabilities:

@RafaelCintron wrote:

WebAI.getCapabilities For fingerprinting reasons, I think we should avoid exposing too much information about the end-user's hardware capabilities to web developers. Web developers should be able to provide hints but, in my opinion, it should be up to the browser to decide how best to use the user's hardware to run the graph.

By referring to NNAPI, our POC exposes three graph execution preferences: fast-answer, sustained-speed and low-power. Web developers can set the preference (hint) according to the use scenario, e.g. user interaction, video streaming or background processing, then the browser and native runtime can help distribute the computation cross hardware.

For Android, them directly map to PreferenceCode and pass to NNAPI runtime. For macOS, according to experiments, we found BNNS execution has faster startup time than MPS execution due to GPU shader compilation overhead. So our POC maps fast-answer to BNNS and sustained-speed to MPSNNGraph in browser.

RafaelCintron commented 5 years ago

My position is not quite "graph level API, fast load"

To clarify:

WebML should include both a "loadModel-style" and "operator-style" APIs.
The operator-style API should allow web developers to connect graph nodes together to allow the browser to optimize the graph.
The model and operator definitions we adopt should come from ONNX as that was developed as an interchange format across all platforms. This group should avoid creating brand new operator definitions that need to be integrated with existing data scientist tools.
v1 of the API should only include inferencing with training to be considered in a subsequent version.
To minimize fingerprinting, the API should not expose the hardware configuration of end-user machines. @huningxin 's proposal of having 'execution preferences' seems like a good alternative.

DanielMazurkiewicz commented 5 years ago

OK, I'll point here my clarification bullets too.

Must have:
- Machine learning is not reserved only for data scientists thus its API should be easy to start ML adventure with, easy to learn and easy to use.
- As it is javascript world and its native data format is JSON - graph models should use that format too for designing, storing and restoring NNs (especially nothing stands against using it)
- ES6 is a good example of how javascript world likes all sort of shorthand thus models should allow to be easy to quickly build simple NNs in JSON using all sort of pretty logic shorthand and presumptions
- There should be no requirement of 3rd party tools to build and train simple NNs
Nice to have:
- API that will allow to cover data preparation processes in easy way
- API that will allow to store/restore internal state of recurrent NNs
- API that will allow to export selectively layers as separate NN
- Possibility to store and restore internal data (state, weights, biases...) in condensed (base64) and human readable forms
- API offering a possibility to define custom training algorithms
- Time has shown that many JS APIs was repurposed so having operators available additionally in standalone API ("fused operators") is a plus in general and at the same time it will suit people who prefer different approach to build NNs or will allow custom ML frameworks to be shipped on top of it
- Minimal set of operators will allow to ship standard quickly to browsers, so small subset of ONNX is suitable for v1 and full set will be ok for v2 (tensorflow, caffe and so on too). In my opinion we can define all operations in v1 (or pick from ONNX), but in terms of development and testing time we should split necessity of shipping of all operators at once, especially that most of basic cases requires just couple of them
- Having operators grouped in domains would be a plus as so many products tries to rule over this standard (all of them then would be able ship their own lists of operators), would be also nice to have domains as operators list will grow in the future, so each new set could be grouped with old in new domain
- Having an option to extend programmatically operators list would be a plus (as it will allow data scientist experiment with new operators and ship them before they become a standard)
- Having an option to extend programmatically operators list with javascript functions that would be compiled to SPIRV (or GLSL) would be a HUGE PLUS (this is what GPU.js does for brain.js) - javscript engine could make detection if function uses only math operations inside and based on that compile for GPU or fallback to JS and that also would allow to automatically cover optimizations on model level and using optimized atomic operations as standalone API and fallback with charm to javascript if spirv not supported (would be nice to have on board someone who could investigate this in terms of ie, chromium and firefox js engines, @huningxin ?)

A word from me:

Guys we have a chance to make something good here. I've made an API proposal as something to start discussions with, but I'm not attached to it. Would be nice if you would free your mind too and use your best experience to bring it to this standard as a cooperation. I promise, we'll all benefit from it.

My (and other people from Brain.js) use case which translates to given above "must have": I have frontend and backend software development background, but in terms of ML I might consider myself still as junior ML data scientist. When I started with ML I started from brain.js and what I liked in it it was that I could quickly jump into ML without necessity to know what is really going on in background (that is what developers prefer most of the time ;-) ). All it required from me was to write literally couple, of pretty understandable for total ML newbie, lines of code and provide training data in pretty easy and readable form. The very next day I had 30kb NN on production (brain.js allows to export NNs as javascript functions).

Of course brain.js is very limited ML in terms of models and even most active developers admits that having more flexibility with operators would be a long term goal for brain.js. But I'm sure that we can achieve a degree of flexibility of this future web ml standard that can suit developers who won't dig too much into ML and ML data scientists at the same time. It is just a matter of providing appropriate (and not over bloated) API, model and models shorthands (and basic training functionality :-) ).

PS. And again, if someone doesn't want training functionality at all or in v1, please provide good reasons. Simple "No" isn't an argument into discussion. PPS. I hope I've made my arguments clear, but if something needs clarification feel free to ask me.

tomoyukilabs commented 5 years ago

So far, we've prototyped the graph execution in our POC. The graph maps to ANeuralNetworksModel of NNAPI and MPSNNGraph of MPS. According to the docs, the NNAPI and MPS enable graph level optimizations, like graph partition in NNAPI and graph rewriting in MPSNNGraph, that would be complementary to higher level frameworks' optimization. As BNNS only supports eager execution by BNNSFilterApply, we implemented the graph execution inside browser for BNNS.

@huningxin does it imply that whether eager and/or graph execution can be supported or not depends on each platform-level API?

huningxin commented 5 years ago

@tomoyukilabs wrote:

@huningxin does it imply that whether eager and/or graph execution can be supported or not depends on each platform-level API?

According to our investigation, there are some implications of mapping execution modes on web to native APIs:

native\web	eager (web)	graph (web)
eager (native)	direct mapping	need to implement the graph traversal (and graph-level optimization) in web browser
graph (native)	implementation may be inefficient, e.g. create and execute a graph for each op	direct mapping

cynthia commented 5 years ago

Intervention; this thread is becoming pretty large - wonder if this could potentially be split into separate discussions? (Github issues not having a threading model doesn't really help here.)

As I see it from a implementor perspective, it makes perfect sense to suggest a high-level API focused on inference as a first step deliverable; direct mappings into platform APIs means it will be simpler to implement, and reaching the developers faster. Getting interoperability faster and lower cost of implementation is not something we can ignore; many implementors have a large backlog of standards they need to implement, and unfortunately the world doesn't have an infinite supply of browser developers.

On the other hand - from a developer's perspective, low-level is great; it enables a whole new world of possibilities to the web platform. There is unfortunately the implementation complexity; not every implementor has an army of engineers to implement the equivalent of TensorFlow or CNTK overnight. And unfortunately, this is a hard goal to achieve with just pre-baked platform APIs.

The other part is the amount of third party dependencies an implementation has to drag in if one doesn't plan to implement all the bits needed for a high performance backend from scratch, this is rather hairy and I don't even know if it will legally work for all implementors with regards to licenses.

That said, I'd love to see low level happen; but it feels like high-level is a much more reachable goal.

IMHO: Training seems like a rather unusual use case. "Please wait for undisclosed amount of time until our network trains" doesn't feel like it would fly with end-users aside from small toy networks. Feed-forward neural networks with limited parameters should be fine to implement as trainable networks with WebASM, but any modern CNN is more or less not something you want to attempt to even do transfer learning on 95% of the PCs out in the wild. (Additionally, there is the catch of "your browser is about to use 10GBs of VRAM" and I personally don't think that would fly.)

1) Do we have all the required primitives needed to feed networks large amounts of data for training? (I'm not entirely confident we do, and feeding data over fetch is going to bottleneck.) 2) Is leaving a browser tab open training for multiple days something that we expect to be a common use case?

One last nit:

making shitty standards

While we as a standardization body have committed some technical sins in the past, there are probably better adjectives. Please be respectful, and ideally give https://www.w3.org/Consortium/cepc/ a quick read.

cynthia commented 5 years ago

Just had a second thought on getCapabilities, will keep it - basic capabilities like supported data types and domains could be useful for loading appropriate models - just will remove info about hardware.

There are issues with this; just because FP16 is supported doesn't necessarily guarantee performance. At the same time exposing FP16 only on "FP16 friendly" GPUs is a pretty narrow subset of users. (Basically, GPUs with Tensor cores - which is probably less than 1% of the desktop market.)

DanielMazurkiewicz commented 5 years ago

@cynthia

Intervention; this thread is becoming pretty large - wonder if this could potentially be split into separate discussions? (Github issues not having a threading model doesn't really help here.)

Agree, I see here five major things:

predefined models and models API
ML API
operators list
graph format
ML operators API

and sub-derivatives:

training or not training ;-)
ML operators API vs. graph format
ML operators API and graph format

There are issues with this; just because FP16 is supported doesn't necessarily guarantee performance. At the same time exposing FP16 only on "FP16 friendly" GPUs is a pretty narrow subset of users. (Basically, GPUs with Tensor cores - which is probably less than 1% of the desktop market.)

With Nvidia it doesn't give any performance gains (even looses AFAIK) but with Vega from AMD for example it boost performance by 100%. Maybe today it is 1% of market (would like to see some real figures here before final judgements) but I assume it will change in near future. When it comes to hardware implementation it requires much less transistors - thus is more energy efficient and this is something that we can't undervalue here. Lastly - I'm not sure if I've said it openly, but I assume FP32 should be obligatory, others not, but standard should be prepared for others too. ML based on FP16 or even integers (or fixed point decimals) is something we can handle by allowing vendors (and users) to add programmatically and built in custom domains (time will show what domains will gather attention and become popular, then decisions could be taken if to ship them as standard to browsers). Also keep in mind that except GPU there are also FPGA, DSP and SIMD(including "vector") options for hardware acceleration of ML (also those existing in majority of mobile devices), so it could easily rise that number of "one percent" of devices that can benefit from other than FP32 based accelerations.

There is unfortunately the implementation complexity; not every implementor has an army of engineers to implement the equivalent of TensorFlow or CNTK overnight.

Yes, exactly, agree. Was digging recently into operators (started to implement PoC for my API proposal) and it will require at most little more than dozen operators (including activation functions as separate operators and operators specific for training purposes). That is something even I could do myself within reasonable time if I worked on full time :-)

IMHO: Training seems like a rather unusual use case. "Please wait for undisclosed amount of time until our network trains"

I'm not sure how much you're familiar with training capabilities, but you can for example adjust NN with every single new real data record (do just one training iteration "live"). Single training iteration itself requires more-less same computation power as computations necessary for neural network itself. So no waiting time here :-) Gaming industry will be thankful for this. And you would be surprised but other industries too... This machine is driven by brain.js and periodically "adjusting training" is performed on newly collected data.

There is entire world to discover for ML except those major mostly advertised areas. :-) Let this standard be really OPEN, also for new ways of use.

I wouldn't also underestimate value of training capabilities in terms of studying purposes - that is something that at some point every dev (or scientist) has to go through if wants to get into ML world. Nowadays kids in a schools learn how to program using browsers. (yes - that is also important browsers use case), why not let them learn ML entirely in a browsers?

It also opens tons of other opportunities for online ML design tools. Not mentioning this standard design freedom (we will not have to adjust then to any existing solution if we find it better for web ml standard).

2\. Is leaving a browser tab open training for multiple days something that we expect to be a common use case?

Is reasonable web product going to require that? Can't later this standard be incorporated into node.js or used in electron based apps?

Just want to point here something I've mentioned earlier - not all NNs will use gigabytes of training data - especially not all ML will be used to work with image or audio data. Not all will require full training, not all trainings will take days, hours or even minutes... Let developers decide how and where they will use their NN.

anssiko commented 5 years ago

With my chair hat on, I'll respond to some points that touch the group's scope, work mode, and policies:

There is entire world to discover for ML except those major mostly advertised areas. :-) Let this standard be really OPEN, also for new ways of use.

The group decided to adopt the proposed use cases as a starting point for the API definition. This means discussion in this group is to be focused on a Web API for neural network inference hardware acceleration that addresses those use cases.

PS. And again, if someone doesn't want training functionality at all or in v1, please provide good reasons. Simple "No" isn't an argument into discussion.

The participants of this group have committed to a particular Scope of Work upon joining. Out of scope topics (e.g. training functionality) or proposed future work are to be discussed in the group's charter repo. The group makes decisions based on consensus and participants do not need to provide reasons why they are not willing to expand the agreed upon scope. If I observe adequate support for a proposal in the charter repo, I will conduct a vote.

Spent on API proposal quite some of my private time to make it suitable for all. Still not a final proposal(latest updates from today), but at the same time I think it already meets above conditions and if not, then would be glad to know why and even more glad with improvements or adjustements proposals [...].

Generally speaking, proposals receive more feedback and support if they are evaluated against use cases. This allows participants to evaluate how well the proposed solution(s) solve the problem(s) the group is tasked to solve. For example, on 10 Jan 2019 call we received an update on how one proof-of-concept works for the semantic segmentation use case.

Lastly:

We encourage frank technical discussion in this group in a manner that is respectful. In order to maintain a positive work environment, I expect everyone to comply with W3C Code of Ethics and Professional Conduct that is implemented across W3C groups, including this group. It is a nicely written document that can be applied outside W3C business in life in general. There's also an informal, more playful edition.

DanielMazurkiewicz commented 5 years ago

@anssiko

There is entire world to discover for ML except those major mostly advertised areas. :-) Let this standard be really OPEN, also for new ways of use.

The group decided to adopt the proposed use cases as a starting point for the API definition. This means discussion in this group is to be focused on a Web API for neural network inference hardware acceleration that addresses those use cases.

Just to clarify my point of view here: as I think "high level" use cases detailed discussion should be postponed to some other phases or at least till we will get consensus around "low level" whenever I express my opinion I'm expressing it about "low level" use cases which as far as I understand (and hope) are to provide flexibility and versatility of ML possible usages.

The group makes decisions based on consensus and participants do not need to provide reasons why they are not willing to expand the agreed upon scope.

Not sure why you mention it, have I violated any rule with my kind (hopefully) request for substantive discussion?

anssiko commented 5 years ago

@DanielMazurkiewicz I mentioned that to clarify the process and mechanics for making charter changes (voting) differs from day-to-day decision-making (consensus-based). These aspects are explained in the charter document. You can reach out to me privately and I’m happy to answer any further procedural questions you may have to keep this issue focused on technical discussion.

Re low-level and high-level, I observe much of the confusion arises from the inconsistent use of these adjectives in different contexts. Low-level & high-level APIs and low-level & high-level use cases do not map and that causes us talking past each other. We need to add definitions of there terms to the spec, or come up with better names.

anssiko commented 5 years ago

People seemed to agree this issue has branched off to multiple directions and in order to continue productive discussion we should split this "mega issue" into smaller self-contained issues we can resolve independently.

As a start, and per resolution on the 14 Feb 2019 call, I created the following two issues:

Executing operations #11
Executing models #12

gregwhitworth commented 5 years ago

I'm going to close this in desire to discuss the different issues in the ones referenced by @anssiko - if anyone feels there isn't something covered by those - please open a new issue.