Open philloooo opened 9 months ago
One example we can compare with is the Pytorch Edge opset.
When we originally considered the design principles of the operator set for WebNN since even before the spec made it out of the community group, the topic of whether the core operator set should also include common high-level operations (commonly known as fusions) or just the rudimentary building block operations came up in many discussions.
Without getting into a philosophical debate of what exactly, besides basic arithmetic operations, should be considered rudimentary operations, we set out to look at it objectively from the context of what already being developed in the industry at various software and hardware layers, and concluded that it is important both for practicality and performance reason to also include common fusions known to be implemented widely in the framework and the underlying platform (e.g. the operating system) layer, with an important caveat that for each fusion defined in the spec, every decomposed operations of its subgraph equivalence must also be defined.
The main objective of this rule is to support the implementation that has yet to fully support a specific fusion to be able to carry on without failing. By pushing operator decomposition downward, we allow the implementation to catch up on it at a later time while simplifying the development of the framework's backend for WebNN. Also note that a keyword here is "common" fusions and not any fusion with unverifiable availability in the platform layers.
For reference, we took the opportunity at that time to describe this rationale in this section of our explainer document.
At the time of that writing, we used GRU and LSTM as de facto samples to describe such common fusions in the discussion. With the emergence of generative AI in the recent year, the better examples would be group-norm, layer-norm, and multi-headed attention -- operations that are widely used in both the diffusion and transformer models nowadays.
Thanks @philloooo for soliciting input from internal Google teams and @wchao1115 for your insights and various contributions in this space.
The group has produced the following documentation on this topic:
In addition, the group has received related position statements from Google in #453 and #573 (this issue), and input from an ONNX project participant on that project's approach. There may be more, but I was able to recall these.
As @wchao1115 noted, this topic has been a long-term consideration and I note the topic re-emerges from time to time when new participants join. This suggests to me the group cares about this topic, and also that we could probably do better in documenting the group's current consensus position.
To help transform this into action, I'd like to ask whether the group is happy with the current organization of related guidelines and documentation, or should we try to perhaps consolidate them somehow? Fold (more of) them into the specification? Is there some further investigation and research to be done to ensure we are well informed and data-driven? A review of widely deployed frameworks (expected users of this group's deliverables) with results shared in public?
Regardless of where this content lives in our repo, I expect this documentation to evolve and be maintained. Everyone's feedback is welcome.
thanks @wchao1115 for elaborating the current design philosophy! I am overall agreed with current design. Thanks @anssiko for linking to the existing resources!
This issue is not trying to exclude high-level operations from the spec. It's trying to bring up couple things related to this topic:
We'll revive this issue with a discussion on additional primitive ops informed by MLIR Linalg, PyTorch Prims IR, TOSA, others. Pointers to relevant background research can be shared here to inform the discussion.
I don't know if we could be also interested in:
core set of operators with precisely defined behavior ... reliably express higher level concepts if they are missing from a given execution runtime
I have a work-in-progress preliminary analysis of operator correspondence here Machine Learning Operator Mapping.xlsx which can help identify current primitive gaps (no firm recommendations yet), as the current WebNN standard has demonstrated viability of popular models, but it lacks breadth. However, implementing 800+ operators (PyTorch, TensorFlow, ...) would be untenable for a web specification that needs to be more rigorously documented and implemented across many user agents with many potential platform backends. Plus there will always be niche operators found in some ML libraries that aren't justifiably common enough to be part of WebNN itself but could still be constructed from WebNN operators, given WebNN has sufficient foundation. So the need is clear for compositional fundamentals (e.g. PyTorch prims, TOSA, StableHLO) which means potentially even adding operators to WebNN that on their own might not directly be that useful in neural networks (like sumPooling and bitwise operators) but are useful for composition.
In talking with internal Google ML frameworks teams, one theme has come up repeatedly when discussing ML execution: the need for a predictable, core set of operators with precisely defined behavior. Without this, frameworks can't provide predictable behavior, and can't reliably express higher level concepts if they are missing from a given execution runtime. We've seen work by internal and external ML frameworks towards defining these core operator sets, and believe this concept is important for WebNN to adopt, and ideally align with any emerging standards for core op sets.
We'd like to build consensus on the following:
Follow-up work:
Actually define the core op set - both the list of ops and their behavior
Have at least 2 implementations to make sure the interface including constraints specified can be supported by multiple platforms
Come up with a rubric for how rigorously the core op set is limited
Determine if a subset of a "standard" core op set is acceptable for v1 (i.e. do we need control flow https://github.com/webmachinelearning/webnn/issues/559 and bitwise operators https://github.com/webmachinelearning/webnn/issues/496 ?)
Define core op set standardization / evolution over time (e.g. in conjunction with frameworks)
\ Related questions, but maybe out of scope for this issue:
What do we call non-core ops? (Composite? High-level? …)
Should all non-core ops be defined in terms of these core ops?
How should we structure the spec to make core vs non-core ops clear?
How precisely should the behavior of non-core ops be constrained?
There are some high level questions that need to be hashed out:
See also:
453
463