w3c / machine-learning-workshop

Site of W3C Workshop on Web & Machine Learning
https://www.w3.org/2020/06/machine-learning-workshop/
42 stars 16 forks source link

ML Model format #74

Open dontcallmedom opened 3 years ago

dontcallmedom commented 3 years ago

@cynthia in his talk points out the lack of consensus on a particular format for ML models:

One of the complications in defining a mechanism to run pre-trained neural network models in the browser is to agree on the standard format for packaging and shipping these neural network models. Machine learning academia and the ecosystem of frameworks have still not agreed on a common format which makes this challenging for us as a platform, as we must choose from one of the multiple competing proposals.

The Model Loader API explainer (presented by @jbingham) offers some of the characteristics of what a good format would be in this context, with MLIR as a potential candidate.

This raises the following questions to me:

bhack commented 3 years ago

There Is also this interesting thread at https://github.com/google/iree/issues/2863

jbingham commented 3 years ago

This raises the following questions to me:

  • does this discussion need consensus beyond the Web browsers community? beyond the JS community (cf #62 https://github.com/w3c/machine-learning-workshop/issues/62)? Can the Web lead the way (or at least, one way) or would it be doomed to fail without broader take up in other popular ML environments?

IMO, it would be valuable to have agreement from the major ML frameworks and runtimes (WinML, CoreML, PyTorch, TensorFlow, etc). The success of any web standard format would depend on enough ML models trained in the major ML frameworks being convertible into the common format(s), and then parseable and convertible back to any native runtime.

Any common format will likely support only a subset of what each ML framework offers, since they're all evolving at different rates and have only partial overlap in their supported operations. If the overlap is valuable enough, and there's agreement, it could work. I know that Google would be very reluctant to be limited to a standard that isn't expressive enough for TensorFlow lite models that we're able to run in Android apps.

  • is there an existing venue or a logical one for building or verifying consensus on the direction to follow for such a format?

Microsoft formed ONNX as a community standard for model definitions, and there has definitely been some traction in the WinML world, and perhaps beyond. In the TensorFlow ecosystem, we haven't heard much interest yet, and Google has declined to be involved in the ONNX efforts so far.

The main technical reason is that the TensorFlow team has been skeptical of standardizing at the level of operations, due to operation fatigue with growth of 20%/yr and well over 1,000 operations now. Google is looking to move to a different approach with the Android NN API and TensorFlow, based on some smaller set of composable operations. More than one such effort is in progress: MLIR and Tensor Compute Primitives (TCP) are just 2 of the options being explored. TCP is being done as a community project. MLIR is open-source, and anyone can create a dialect. Google and Microsoft have discussed the idea of a web dialect of MLIR, that's aligned with an ONNX operation set. We haven't started working on it yet though.

If WASM + SIMD gives enough performance gains, that could buy us time until the ML world stabilizes a bit more. Given how long the standards process takes, exploring multiple options for model formats in parallel might be pragmatic.

wchao1115 commented 3 years ago

With regards to the format issue, the people working on WebNN including myself have spent considerable amount of time researching and comparing ML operation semantics across many popular frameworks and de-facto standards. An interesting observation from this exercise is that they are much more in common than most people think. This is probably due to the domain's historical root and the openness of the research community in the evolution of the domain knowledge. It is evident in the fact that most models can be converted to a different format reasonably well at the semantic level. Most problems people are facing today are tactical operational gaps such as redundancy and tool chain inefficiency.

As a case study, when we started the development of the DirectML project a few years ago, we initially modeled its semantic operations on the earlier versions of ONNX, the format backing the WinML API. To our pleasant surprise, we found that more than 90% of DirectML functionality already built was readily transferable to our work on TensorFlow. If one would look at key building block functions such as gemm or matmul, or even some reusable layers such as convolution or recurrent networks across the many formats we have, one would find that these operations are in fact already semantically compatible, down to the parameter semantics in many cases.

In my view, the difference among these formats are more tactical and less semantic. They differ in breadth of variety and reusability, which are perhaps due to uncontrolled growth from rapid development. They in fact do share a very big overlap. In this sense, ML operations are not that different from regular API -- they are defined with the purpose of reuse. When Cho et al. introduced in their 2014 paper a novel recurrent network, there was no ML operation gru defined anywhere. It was only later when people want to reuse them in their work that a new operation was defined for it in their respective frameworks, in almost identical forms, and not just for a part of it but for the entire network. So, reuse drives adoption of new operations to an extent that, over time, some even make its way into dedicated silicon.