Core operator set - Githubissues

philloooo commented 9 months ago

In talking with internal Google ML frameworks teams, one theme has come up repeatedly when discussing ML execution: the need for a predictable, core set of operators with precisely defined behavior. Without this, frameworks can't provide predictable behavior, and can't reliably express higher level concepts if they are missing from a given execution runtime. We've seen work by internal and external ML frameworks towards defining these core operator sets, and believe this concept is important for WebNN to adopt, and ideally align with any emerging standards for core op sets.

We'd like to build consensus on the following:

WebNN should define a core op set, which focuses on the low level ops that are indecomposable and captures functional completeness of the API.
Implementations of WebNN MUST (in the RFC 2119 sense) implement this core op set.
The behavior of these core ops will be specified precisely with conformance tests in WPT.
We must validate with multiple ML frameworks that the identified core op set meets their needs.

Follow-up work:

Actually define the core op set - both the list of ops and their behavior
Have at least 2 implementations to make sure the interface including constraints specified can be supported by multiple platforms
Come up with a rubric for how rigorously the core op set is limited
- E.g. Would we include both sin() and cos(), even though you can define one in terms of the other? Do we only need nand() because then you can make and/or/no/xor ?
Determine if a subset of a "standard" core op set is acceptable for v1 (i.e. do we need control flow https://github.com/webmachinelearning/webnn/issues/559 and bitwise operators https://github.com/webmachinelearning/webnn/issues/496 ?)
Define core op set standardization / evolution over time (e.g. in conjunction with frameworks)

\ Related questions, but maybe out of scope for this issue:
What do we call non-core ops? (Composite? High-level? …)
Should all non-core ops be defined in terms of these core ops?
How should we structure the spec to make core vs non-core ops clear?
How precisely should the behavior of non-core ops be constrained?

There are some high level questions that need to be hashed out:

Similar to GPUSupportedLimits, whether/how to expose limits that are backend specific? This probably needs more implementation experience before answering.

See also:

453
463

philloooo commented 9 months ago

One example we can compare with is the Pytorch Edge opset.

wchao1115 commented 9 months ago

When we originally considered the design principles of the operator set for WebNN since even before the spec made it out of the community group, the topic of whether the core operator set should also include common high-level operations (commonly known as fusions) or just the rudimentary building block operations came up in many discussions.

Without getting into a philosophical debate of what exactly, besides basic arithmetic operations, should be considered rudimentary operations, we set out to look at it objectively from the context of what already being developed in the industry at various software and hardware layers, and concluded that it is important both for practicality and performance reason to also include common fusions known to be implemented widely in the framework and the underlying platform (e.g. the operating system) layer, with an important caveat that for each fusion defined in the spec, every decomposed operations of its subgraph equivalence must also be defined.

The main objective of this rule is to support the implementation that has yet to fully support a specific fusion to be able to carry on without failing. By pushing operator decomposition downward, we allow the implementation to catch up on it at a later time while simplifying the development of the framework's backend for WebNN. Also note that a keyword here is "common" fusions and not any fusion with unverifiable availability in the platform layers.

For reference, we took the opportunity at that time to describe this rationale in this section of our explainer document.

At the time of that writing, we used GRU and LSTM as de facto samples to describe such common fusions in the discussion. With the emergence of generative AI in the recent year, the better examples would be group-norm, layer-norm, and multi-headed attention -- operations that are widely used in both the diffusion and transformer models nowadays.

anssiko commented 9 months ago

Thanks @philloooo for soliciting input from internal Google teams and @wchao1115 for your insights and various contributions in this space.

The group has produced the following documentation on this topic:

Explainer's What is the right level of abstraction for the neural network operations?
Contributing guidelines for proposing and adding a new operation
In the context of security considerations, guidelines for new operations
Analysis of select "v1" models and "v2" transformer models to identify common ground

In addition, the group has received related position statements from Google in #453 and #573 (this issue), and input from an ONNX project participant on that project's approach. There may be more, but I was able to recall these.

As @wchao1115 noted, this topic has been a long-term consideration and I note the topic re-emerges from time to time when new participants join. This suggests to me the group cares about this topic, and also that we could probably do better in documenting the group's current consensus position.

To help transform this into action, I'd like to ask whether the group is happy with the current organization of related guidelines and documentation, or should we try to perhaps consolidate them somehow? Fold (more of) them into the specification? Is there some further investigation and research to be done to ensure we are well informed and data-driven? A review of widely deployed frameworks (expected users of this group's deliverables) with results shared in public?

Regardless of where this content lives in our repo, I expect this documentation to evolve and be maintained. Everyone's feedback is welcome.

philloooo commented 9 months ago

thanks @wchao1115 for elaborating the current design philosophy! I am overall agreed with current design. Thanks @anssiko for linking to the existing resources!

This issue is not trying to exclude high-level operations from the spec. It's trying to bring up couple things related to this topic:

Currently comparing with an op set like pytorch edge set, webnn still has some core ops missing. We are also trying to work with internal teams to see if we can share something similar for StableHLO. So I created this issue to track the status of how "complete" is our core op set.
How we present the core op set with other high-level/composite ops in the spec is up to discussion. We can present as is now, so they are differentiated by whether there are decomposition defined. Or we put them to different sections, or we just add annotations for core ops.
We should ensure the core op set to behave the same across backends. The high level/composite ops are easier to diverge between backends though. Take lstml as example, the supported activations are different across backends. We need a solution to better support composite ops.

anssiko commented 3 weeks ago

We'll revive this issue with a discussion on additional primitive ops informed by MLIR Linalg, PyTorch Prims IR, TOSA, others. Pointers to relevant background research can be shared here to inform the discussion.

bhack commented 3 weeks ago

I don't know if we could be also interested in:

fdwr commented 3 weeks ago

core set of operators with precisely defined behavior ... reliably express higher level concepts if they are missing from a given execution runtime

I have a work-in-progress preliminary analysis of operator correspondence here Machine Learning Operator Mapping.xlsx which can help identify current primitive gaps (no firm recommendations yet), as the current WebNN standard has demonstrated viability of popular models, but it lacks breadth. However, implementing 800+ operators (PyTorch, TensorFlow, ...) would be untenable for a web specification that needs to be more rigorously documented and implemented across many user agents with many potential platform backends. Plus there will always be niche operators found in some ML libraries that aren't justifiably common enough to be part of WebNN itself but could still be constructed from WebNN operators, given WebNN has sufficient foundation. So the need is clear for compositional fundamentals (e.g. PyTorch prims, TOSA, StableHLO) which means potentially even adding operators to WebNN that on their own might not directly be that useful in neural networks (like sumPooling and bitwise operators) but are useful for composition.

webmachinelearning / webnn

Core operator set #573

453

463