w3c / machine-learning-charter

Discussions on a possible charter for a future W3C Working Group developing Machine Learning solutions
https://w3c.github.io/machine-learning-charter/charter.html
9 stars 3 forks source link

Set of ops supported must be more comprehensive #23

Closed anssiko closed 1 year ago

anssiko commented 1 year ago

(Related to the WebML WG Charter in development at https://github.com/w3c/machine-learning-charter/pull/19)

We discussed "v2" use cases for WebNN and @wchao1115 shared the following feedback:

… for v2, one of the constant feedback from our external partners when discussing WebNN for their use case has been the ops … the set of ops supported must be more comprehensive … this needs to be more explicit goal, this is important … related to that, use cases around transformers

The current charter Scope enumerates a few common ones: "convolution, pooling, softmax, normalization, fully connected, activation, recurrent neural network (RNN) and long short-term memory (LSTM)". This is not meant to be an all inclusive list and does give the WG ability to adapt to the changes in this landscape.

At minimum, we should review the bullets in the Scope section, and see whether to explicitly mention some of the more recent work such as transformers. We want to give enough detail to give good direction without constraining the WG too much. The list of ops mentioned in the charter would be open-ended.

anssiko commented 1 year ago

The WG's work is motivated by the compelling user experiences it enables to web users. To that end, I'm proposing to add the following informative text to the Motivation and Background section:

Computer Vision enables computers to gain understanding from images or videos, Natural Language Processing enables interaction between computers and human languages, and Speech Recognition enables computers to recognize and translate spoken language into text. Bringing these experiences to the web in a privacy-preserving manner requires efficient machine learning inference capabilities built into the browser.

I think in addition we may want to clarify in the Scope section that the list of ops enumerated are examples of more established ops and the WG wants to give priority to ops that accelerate the above mentioned user experiences in CV, NLP, and Speech Recognition. Thus I'm proposing to add this non-binding text into the Scope section following the bullet list:

This Working Group puts priority on building blocks required by well-known model architectures in the fields of Computer Vision, Natural Language Processing and Speech Recognition."