Open bbernhar opened 9 months ago
My recent investigation into supporting MLBuffer
on CoreML has lead me to the following two suggestions for createBuffer()
:
The only zero-copy way to pass a buffer to both WebGPU (as an IOSurface
) and CoreML (as an MLMultiArray
) is to first allocate the buffer as an IOSurface
containing "float16" data (IOSurface
-> CVPixelBuffer
-> MLMultiArray
)
If the MLBuffer
is to be used with WebGPU it must be allocated in this fashion (to be zero-copy, at least), whereas an MLBuffer
which is only used within WebNN may be allocated as an MLMultiArray
directly (more on that below)
MLBufferDescriptor
should include an MLOperandDescriptor
rather than an MLSize64
CoreML's inputs and outputs are given as MLMultiArray
s, which require the data type and dimensions to be known. If we're to allocate a hardware buffer for createBuffer()
, this information must be known.
Given that the dimensions + data type of input and output operands to an MLGraph
are well-defined anyways, it seems reasonable to enforce that an MLBuffer
must have matching constraints to be passed as an input or output to an MLGraph
as #544 describes? Is there a reason why we should keep MLSize64
?
Thanks @a-sully for delving into the CoreML side of things.
Regarding the need for a WebGPU usage flag:
Is it feasible for an MLBuffer to always be created as an MLMultiArray where, upon import to WebGPU, we could assign or request the usages? Assigning GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC appears to be sufficient.
As for the question about keeping MLSize64:
Without MLSize64
, any ML framework that doesn't represent its tensor datatype like MLMultiArray would require re-architecting to avoid creating (especially output) tensors from raw allocations (or malloc). Alternatively, the web developer would need to defer calling createBuffer() until dispatch(), impacting the first inference-time. Could MLOperandDescriptor
be made optional instead? The size could then be ignored where irrelevant.
Is it feasible for an MLBuffer to always be created as an MLMultiArray where, upon import to WebGPU, we could assign or request the usages?
AFAICT an MLMultiArray
cannot be handed off to WebGPU. It's a data type specific to CoreML. Importing to a type WebGPU can understand would always require a copy - even on UMA systems, which would be unfortunate!
Without
MLSize64
, any ML framework that doesn't represent its tensor datatype like MLMultiArray would require re-architecting to avoid creating (especially output) tensors from raw allocations (or malloc). Alternatively, the web developer would need to defer calling createBuffer() until dispatch(), impacting the first inference-time.
Hmm I'm not sure if I understand your concern.... Implementations are still welcome to allocate an MLBuffer
as one contiguous block of memory. It's just that the WebNN "front end" would assert (when you call dispatch()
) that the dtype and dimensions of the passed-in MLBuffer
match what the graph expects, rather than just that the sizes are the same. Concretely, that maps to these checks in your prototype CL.
Is the use case you're referring to one where a single MLBuffer
is assumed to be able to be contort into different dtype and dimensions? For example:
const mlBuffer = new MLBuffer({size:3*4*4});
// `graph1` expects a float32 output with shape [3, 4]
context.dispatch(graph1, inputs, {'out': mlBuffer});
// `graph2` expects a float16 input with shape [4, 3, 2]
context.dispatch(graph2, {'in': mlBuffer}, outputs);
This hides an internal reinterpretation of the data type and dimensions of what's assumed to be an opaque bag of bytes. I think there's a reasonable argument that this implementation detail should not make its way into the WebNN spec, which shouldn't prescribe a particular implementation.
WebNN has reshape
and cast
operators. In the example above, graph2
may use these operators to convert an input into whatever dtype and dimensions it needs, if it still wants to be able to use mlBuffer
. An advantage of this approach is that the otherwise opaque reinterpretation of the buffer can be expressed in terms of other well-defined operators.
Could you elaborate on the use case(s) you have in mind?
Could
MLOperandDescriptor
be made optional instead? The size could then be ignored where irrelevant.
What would be the expected behavior on platforms which require a data type and dimensions when the buffer is allocated? An MLOperandDescriptor
implies a size - but not the other way around.
AFAICT an MLMultiArray cannot be handed off to WebGPU.
I was expecting we start the allocation in CoreML via MLBuffer
then import it as a MTLBuffer
into WebGPU, using the GPUBuffer
usages I mentioned.
Implementations are still welcome to allocate an MLBuffer as one contiguous block of memory
Consider a native C++ framework which implements a Tensor
dtype as a bag of bytes. If you want to deploy this ML framework using WebNN JS API as an execution provider (or EP), it expects buffers will be allocated using a size. If we force createBuffer()
to accept only a MLOperandDescriptor
then this EP couldn't simply map Tensor
allocation to createBuffer()
. They would need to come up with a solution just for MLBuffer, preserving MLOperandDescriptor
, or defer createBuffer()
, which seems either burdensome or ineffective.
AFAICT an MLMultiArray cannot be handed off to WebGPU.
I was expecting we start the allocation in CoreML via
MLBuffer
then import it as aMTLBuffer
into WebGPU, using theGPUBuffer
usages I mentioned.
Ah, I think my wording of "We need a WebGPU usage flag" above was misleading. I'm not suggesting that we need WebGPU usage flags here, but rather a usage flag saying "I want this MLBuffer
to convertible to a GPUBuffer" (because the implementation may use that information to determine where/how the buffer should be allocated). Does that clear things up?
Could you also clarify what exactly you mean by "start the allocation in CoreML"? I assume you mean "as an MLMultiArray
", but that would require the dtype and dimensions to be known, no?
Implementations are still welcome to allocate an MLBuffer as one contiguous block of memory
Consider a native C++ framework which implements a
Tensor
dtype as a bag of bytes. If you want to deploy this ML framework using WebNN JS API as an execution provider (or EP), it expects buffers will be allocated using a size. If we forcecreateBuffer()
to accept only aMLOperandDescriptor
then this EP couldn't simply mapTensor
allocation tocreateBuffer()
. They would need to come up with a solution just for MLBuffer, preservingMLOperandDescriptor
, or defercreateBuffer()
, which seems either burdensome or ineffective.
Thanks for the explanation. Could you provide a concrete example of where this concern is relevant? A quick glance at some common ML frameworks suggests that just size
is often not sufficient to allocate a tensor. OnnxRuntime's JavaScript Tensor requires dtype and dimensions, for example. As does TFLiteโs equivalent. Are there known examples where a size is available but not the dtype and dimensions? Presumably the MLBuffer
is being allocated with use by some given MLGraph
in mind, and the data types and dimensions of inputs and outputs must already be known? (input()
and build()
(for outputs) each require an MLOperandDescriptor
)
Another consideration is that size
may not be enough regardless of whether we want to replace size
with an MLOperandDescriptor
. As mentioned above, I expect we'll need usage flags, too. Does your concern still hold if arguments other than size
become required?
Could you also clarify what exactly you mean by "start the allocation in CoreML"?
Could we pass a union to createBuffer()
which specifies either the size or MLOperandDescriptor
so MLBuffer
could be always created as MLMultiArray
? If not, another (possible) alt. solution is have createBuffer(size)
defer creation of MLMultiArray
until dispatch()
.
Are there known examples where a size is available but not the dtype and dimensions?
Yes, the ORT web tensor dtype can only be implemented behind a "malloc" like C inference. When WebNN is used as a EP, it exists within the ML runtime itself.
Yes, the ORT web tensor dtype can only be implemented behind a "malloc" like C inference.
I don't understand this comment because all of the Tensor
constructors in that header take shape information. What am I missing?
I don't understand this comment because all of the
Tensor
constructors in that header take shape information. What am I missing?
Notice the Tensor
constructor uses a IAllocator
interface. That's the only way MLBuffer
can be created from because it must own the buffer for the specified shape. Funny enough, the shape information is right there but the main point is ORT expects its possible to whip up a device buffer given only a size.
Taking a step back, the Web Platform Design Principles implore us to "design based on user needs, not the underlying API or hardware":
This means newly proposed APIs should be designed with careful consideration on how they are intended to be used rather than how the underlying hardware, device, or native API available today.
The use cases for an MLBuffer
- using some (hardware-optimized) buffer as an input or output to an ML graph - all require that the data type and dimensions of the buffer be known. We should not prescribe implementation details, such at that the buffer must be allocated contiguously, as this other design principle cautions:
Be particularly careful about exposing the exact lifecycle and data structures of the underlying native APIs. When possible, consider flexibility for new hardware.
The point about considering flexibility for new hardware is especially pertinent to WebNN :)
While I understand the desire to design a web platform API which (especially WASM) user-space frameworks can easily plug into, the web platform API should not bend over backwards to accommodate the implementation choices of any given framework. And the web platform API certainly should not bake in assumptions based on the current limitations of said frameworks! In this case, ORT does not support CoreML in cases where an MLMultiArray
used as an output is not contiguously allocated. It seems likely that addressing that limitation would require changes to ORT which are ~the same as what would be needed to support MLBuffer
if creating an MLBuffer
required a dtype and dimensions?
[@a-sully wrote]
The only zero-copy way to pass a buffer to both WebGPU (as an IOSurface) and CoreML (as an MLMultiArray) is to first allocate the buffer as an IOSurface containing "float16" data (IOSurface -> CVPixelBuffer -> MLMultiArray)
For Apple platforms, my understanding is you can go from MLMultiArray
-> MTLBuffer
by calling getBytesWithHandler + newBufferWithBytesNoCopy. With an MTLBuffer
you should be able to create a WebGPU buffer.
Why are IOSurfaces be required?
another (possible) alt. solution is have
createBuffer(size)
defer creation ofMLMultiArray
untildispatch()
.
Seems doable. writeBuffer()
may hold the BigBuffer with user data, then at dispatch()
, create an MLMultiArray
by initWithDataPointer
?
For Apple platforms, my understanding is you can go from
MLMultiArray
->MTLBuffer
by calling getBytesWithHandler + newBufferWithBytesNoCopy. With anMTLBuffer
you should be able to create a WebGPU buffer.
Good question! I originally thought so too, but my current understanding is that this is not generically true (i.e. for all data types). If anyone can definitively confirm or dispute this understanding (@mwyrzykowski?) please speak up! Alright here goes...
The docs of newBufferWithBytesNoCopy
say that it:
Creates a buffer that wraps an existing contiguous memory allocation
whereas the docs for getBytesWithHandler
say of the buffer:
It may not store these scalar values contiguously
so I would assume that this would not be allowed (or at least not be zero-copy) unless the MLMultiArray
was specifically allocated contiguously.
How can we ensure an MLMultiArray
is allocated contiguously?
Of all the MLMultiArray
constructors, the candidates for ensuring a contiguous memory allocation seem to be:
The first one looks promising! Unfortunately it seems - based on past offline discussions - that CoreML internally makes a copy of the bytes when using this constructor. That strides
is a parameter seems to corroborate this.
So this would not be zero-copy:
another (possible) alt. solution is have
createBuffer(size)
defer creation ofMLMultiArray
untildispatch()
.Seems doable.
writeBuffer()
may hold the BigBuffer with user data, then atdispatch()
, create anMLMultiArray
byinitWithDataPointer
?
The latter constructor takes a CVPixelBuffer
, but this only works if the CVPixelBuffer
is a "float16" IOSurface
in disguise:
Use this initializer to create an
IOSurface
-backedMLMultiArray
that reduces the inference latency by avoiding the buffer copy to and from some compute units.The pixel bufferโs pixel format type must be
kCVPixelFormatType_OneComponent16Half
. TheMLMultiArray
data type isMLMultiArrayDataType.float16
.
So eith regards to this question....
Why are IOSurfaces be required?
It seems that the only way to avoid copies of a backing memory which is to be shared as both an MLMultiArray
and an MTLBuffer
is to start with a float16 IOSurface
. Unfortunately this suggests that zero-copy buffer sharing is only possible under certain dtype + "do we need to share with WebGPU" configurations. Of course, if we know the memory will stay within CoreML (i.e. it doesn't need to be shared with WebGPU) then we can allocate an MLMultiArray
directly, though this would require dtype and shape to be known before writeBuffer()
Data Type | WebNN Use Only | WebGPU Interop |
---|---|---|
float16 | โ
Zero copy (as MLMultiArray or IOSurface ) |
โ
Zero copy (as IOSurface ) |
float32 | โ
Zero copy (as MLMultiArray ) |
๐ Data copies (with initWithDataPointer ) |
float64 | โ
Zero copy (as MLMultiArray ) |
๐ Data copies (with initWithDataPointer ) |
int32 | โ
Zero copy (as MLMultiArray ) |
๐ Data copies (with initWithDataPointer ) |
other | โ May be emulated as int32? | โ Not sure |
For Apple platforms, my understanding is you can go from
MLMultiArray
->MTLBuffer
by calling getBytesWithHandler + newBufferWithBytesNoCopy. With anMTLBuffer
you should be able to create a WebGPU buffer.Good question! I originally thought so too, but my current understanding is that this is not generically true (i.e. for all data types). If anyone can definitively confirm or dispute this understanding (@mwyrzykowski?) please speak up! Alright here goes...
It is zero copy in CoreML but anything other than fp16 + CVPixelBuffer will result in a copy below CoreML
web platform API certainly should not bake in assumptions based on the current limitations of said frameworks!
Not all HW APIs require a MLOperandDescriptor
for buffer creation, not specific to ORT (ex. DML). If the ML framework wants to pre-allocate buckets of memory but WebNN cannot (aka GPUBuffer), that's equally an assumption on WebNN's behalf IMO.
Unless MLMultiArray
can NOT be implemented through a MLBuffer
, it seems unnecessary to require only a MLOperandDescriptor
.
@a-sully Thinking of a way forward to unblock CoreML.
Here's the options I've gathered:
MLBuffer(MLOperandDescriptor)
and workaround the problem in ORT by calling createBuffer() in dispatch().MLMultiArray
, WebNN RT provides an IAllocator
impl.MLBuffer
and have the CoreML impl. cache MLMultiArray(s) upon dispatch().I am not a fan of (1) because it bakes assumptions into the WebNN spec (ex. ORT never pre-allocates or uses untyped buffers). Untyped buffers (aka byte buffers with a linear layout) for example, could be partially dispatched via a MLBufferView
, re-used between multiple calls to dispatch(), or pre-allocated from a larger MLBuffer
using createBuffer(size)
.
The other option (2), means WebNN backends (ex. DML resources) must be re-implemented to work like MLMultiArray
(which requires strides
to read and write), which is a considerable effort/burden. If (3) is possible, it seems like the simplest path forward, did you have a chance to investigate this?
Thanks for the input @bbernhar. I've been exploring this space more, and I still believe the path forward if we want "a device-based storage object that may be used by WebNN operations" is the following:
4. Use MLBuffer(MLOperandDescriptor, MLBufferUsageFlags)
and frameworks which use WebNN should not assume implementation details, such as that tensors will always be contiguously allocated
Responses inline:
web platform API certainly should not bake in assumptions based on the current limitations of said frameworks!
Not all HW APIs require a
MLOperandDescriptor
for buffer creation, not specific to ORT (ex. DML). If the ML framework wants to pre-allocate buckets of memory but WebNN cannot (aka GPUBuffer), that's equally an assumption on WebNN's behalf IMO.
Hmm I'm not following here. The question is not whether HW APIs need an MLOperandDescriptor
, but whether HW APIs can support the contract specified by MLBuffer
.
If an ML framework wants to allocate a GPUBuffer
, how is that relevant to WebNN? Could you please elaborate on this point?
I am not a fan of (1) because it bakes assumptions into the WebNN spec (ex. ORT never pre-allocates or uses untyped buffers). Untyped buffers (aka byte buffers with a linear layout) for example, could be partially dispatched via a
MLBufferView
, re-used between multiple calls to dispatch(), or pre-allocated from a largerMLBuffer
usingcreateBuffer(size)
.
Please refer back to https://github.com/webmachinelearning/webnn/issues/542#issuecomment-2067400057. The WebNN spec should not prescribe implementation details, such at that the buffer must be allocated contiguously. This violates the design principles here: https://w3ctag.github.io/design-principles/#usecase-oriented-apis
The other option (2), means WebNN backends (ex. DML resources) must be re-implemented to work like
MLMultiArray
(which requiresstrides
to read and write), which is a considerable effort/burden.
I don't understand this suggestion. MLOperandDescriptor
does not include strides
- just dtype and shape. And this shape does not imply there must be strides; how/where an MLBuffer
is allocated is entirely an implementation detail. If an MLBuffer
were to be created with an MLOperandDescriptor
, presumably the user agent's DML backend could calculate the total byte size and allocate a contiguous array as it currently does. The only thing that would change in the user agent implementation is a check that an MLBuffer
's MLOperandDescriptor
matches the MLOperandDescriptor
expected by the input and output operands (in the Chromium implementation, this would be a platform-agnostic check that happens in the renderer anyways).
If (3) is possible, it seems like the simplest path forward, did you have a chance to investigate this?
This is not possible without several data copies (e.g. where does the data go when writeBuffer()
is called?). This also falls apart if MLBuffer
is not type-safe and can be assumed to be recasted/reshaped to any dtype and shape: https://github.com/webmachinelearning/webnn/issues/542#issuecomment-2065073674
If an ML framework wants to allocate a GPUBuffer, how is that relevant to WebNN? Could you please elaborate on this point?
The developer has to know the layout in order to calculate offsets which split-up and re-use a larger buffer piece-meal. Note: a linear layout does not dictate how MLBuffer
gets implemented, it could actually be non-contiguous. In WebGPU, GPUBuffer
layout is known (and linear) so web developers can implement IAllocator
on-top of GPUBuffer
. If we don't allow createBuffer(size)
, then that problem gets punted into the WebNN runtime. If the DML backend called CreateCommitedResource() every call to createBuffer(), our first-inference performance would be awful, which is why compute() implements its own IAllocator
already. But since MLBuffer
are pre-allocated before build(), we can't just FIFO it and be done with it.
This is not possible without several data copies
Bummer. The more I think about it, the more likely MLBuffer
needs to behave like MLTensor
. DML can emulate MTLMultiArray
ops but not vise-versa.
The more I think about it, the more likely
MLBuffer
needs to behave likeMLTensor
Ah yes, this is what I've been advocating for but without using that specific vocabulary ๐
@a-sully
If the layout of MLBuffer
will be unknown, we also need to specify a way for the web developer to initialize tensor data, as readBuffer() and writeBuffer() assumed the layout was linear. For zero-copy, it seems MLBuffer
must index into a MTLMultiArray
since createBuffer(MLOperandDescriptor) wouldn't accept an ArrayBufferView
.
Could you help me understand the plan there?
Hmmm I thought it was a given (based on my earlier comments here) that readBuffer()
and writeBuffer()
would not be zero-copy. A closer look at the CoreML API has convinced me that guaranteed zero-copy buffer-mapping from JS is not possible (since again, initWithDataPointer
would still result in copies) - and as I stated in that earlier comment, I don't think this is too big of a deal, at least for inputs and outputs (constants may be a different story)
My claim - if we assume that readBuffer()
and writeBuffer()
will have copies - is that the web platform layer should always be able to provide the caller the illusion of linear memory, even if it's not linear under the hood. The MLMultiArray
's subscript(_:)
method provides this abstraction this, for example. Do you see any issues with this approach?
Do you see any issues with this approach?
Nope, the proposed change SGTM then. I wasn't sure where offset translation was occurring (now I understand its an impl. detail). Thanks for answering.
A couple issues were re-raised today by @huningxin during @a-sully's prototyping of buffer usages.
Summarized as follows:
createBuffer()
be given a default usage at creation (ex. INPUT|OUTPUT
)?OUTPUT
cannot disambiguate between "on-device only" or efficiently used by readBuffer()
.The use-case for (2) is when a MLBuffer
output gets imported into WebGPU where readBuffer()
is never called (either WebGPU is the final destination or WebNN re-uses the output). A "on-device only" usage is unique because it offers better bandwidth, namely for dGPU.
For 1) I see value assuming INPUT|OUTPUT
upon creation because it allows the web developer to forget about usages or tracking buffers-by-usage, esp. if performance wasn't an issue.
For 2) shall we consider prefixing CPU access visibility?
CPU_INPUT
: CPU write optimal, slow GPU read/writeCPU_OUTPUT
: CPU read optimal, slow GPU read/writeOUTPUT
: CPU has no access, fast GPU read/writeAppreciate any thoughts/feedback.
@RafaelCintron @huningxin
For 2) shall we consider prefixing CPU access visibility?
CPU_INPUT
: CPU write optimal, slow GPU read/writeCPU_OUTPUT
: CPU read optimal, slow GPU read/writeOUTPUT
: CPU has no access, fast GPU read/write
+1, regarding to enum value naming, should consider using something like D3D12_HEAP_TYPE enumeration?
UPLOAD
: CPU write optimal, slow GPU read/writeREADBACK
: CPU read optimal, slow GPU read/writeDEFAULT
: CPU has no access, fast GPU read/write
INPUT|OUTPUT
Do we need to distinguish whether a GPU buffer is used for graph input or output? I mean, how would an implementation handle INPUT
and OUTPUT
differently?
@huningxin Thanks for the comments.
+1, regarding to enum value naming, should consider using something like D3D12_HEAP_TYPE enumeration?
The underlying memory/heap type used by the WebNN implementation could be determined based on the usage alone. See WebGPU: https://www.w3.org/TR/webgpu/#programming-model-resource-usages
Do we need to distinguish whether a GPU buffer is used for graph input or output? I mean, how would an implementation handle INPUT and OUTPUT differently?
WebNN runtime would use INPUT
or OUTPUT
to create buffers in write-combined or write-back memory (aka UPLOAD
and READBACK
per this table) and could validate if the usage matches: INPUT
=> dispatch(input, ...)
.
writeBuffer()
is fast, readBuffer()
is slow.readBuffer()
is fast, writeBuffer()
is slow.writeBuffer()
or readBuffer()
.@a-sully @reillyeon @huningxin @RafaelCintron
Thoughts/concerns with introducing the (proposed) buffer creation usages below?
For context, these new usages allow DML to correctly configure (and directly maps) memory properties upon creatingBuffer() [1] and would determine how a MLBuffer
may be used after creation. WebNN backend APIs that do not require this merely validate the usage is allowed.
MLBufferUsage(s):
JS_READ
: buffer can be used with readBuffer(). Can be combined with JS_WRITE
.JS_WRITE
: buffer can be used with writeBuffer(). Can be combined with JS_READ
.JS_NONE
: buffer can only be used for dispatch(). Cannot be combined with JS_WRITE
or JS_READ
.JS example
const output = await mlContext.createBuffer({
usage: GPUBufferUsage.JS_READ
});
await mlContext.readBuffer(output); // OK
mlContext.writeBuffer(output, ..); // throws error
JS example
const output = await mlContext.createBuffer({ usage: GPUBufferUsage.JS_WRITE }); await mlContext.readBuffer(output); // OK mlContext.writeBuffer(output, ..); // throws error
nit: Did you mean to use MLBufferUsage.JS_READ
in this example?
Eventually we'll need a flag to indicate that this buffer may be shared with WebGPU. As I've discussed elsewhere, this dictates how an MLBuffer
should be allocated on Mac. That's a separate issue (#688) that I'm not trying to solve here, though it would be nice to have an idea of how the proposed MLBufferUsage
flags will interact with that flag (e.g. #688 suggests that importing an MLBuffer
into WebGPU will yield a GPUBuffer
with GPUBufferUsageFlags.STORAGE
and GPUBufferUsageFlags.COPY_SRC
flags. Is this true/allowed in all cases?)
Overall this seems reasonable, though I do have a few thoughts:
JS_NONE
it could be DISPATCH
...MLBuffer
s have the ability to be used with dispatch()
, then this is implied and we don't need this flag at all. Not passing any other usage flags would map to D3D12_HEAP_TYPE_DEFAULT
Thoughts on:
READ_FROM
: buffer can be used with readBuffer()
. Can be combined with WRITE_TO
WRITE_TO
: buffer can be used with writeBuffer()
. Can be combined with READ_FROM
WEB_GPU_INTEROP
: buffer can be used with GPUDevice.importExternalBuffer()
. Can be combined with ???Thanks @a-sully for the feedback.
nit: Did you mean to use MLBufferUsage.JS_READ in this example?
Good catch, fixed.
and we don't need this flag at all
SGTM.
Thoughts on:
- READ_FROM: buffer can be used with readBuffer(). Can be combined with WRITE_TO
- WRITE_TO: buffer can be used with writeBuffer(). Can be combined with READ_FROM
SGTM.
(eventually) WEB_GPU_INTEROP: buffer can be used with GPUDevice.importExternalBuffer(). Can be combined with ???
With any other WebNN usages. Calling importExternalBuffer() could simply ignore them as MLBuffer
is (currently) treated as WebGPU-equivalent usage of STORAGE
and is neutered.
Purpose/Motivation
Defines a device-based storage object that may be used by WebNN operations. This is a sub-issue of #482.
Proposed API
Example JS
Edits
MLOperandDescriptor
Alternative API proposals
N/A
Opens