Specify WebNN timelines

a-sully commented 9 months ago

The spec mentions some timelines (a "parallel timeline", "a GPU timeline", "a different timeline", "the offloaded timeline", etc...) but these timelines are not described anywhere. Meanwhile, https://github.com/webmachinelearning/webnn/issues/482 mentions a "device timeline" and "content timeline" (and, at the time of writing, there have been early discussions about whether an MLQueue is needed - which may or may not require a "queue timeline", as well)

These timelines should be clearly defined, including:

their behavior on implementations that involve multiple devices and timelines (see https://github.com/webmachinelearning/webnn/issues/350)
their interactions with WebGPU's timelines

bbernhar commented 4 months ago

From what I gather, we have at-least 3 timelines:

Content timeline. For JavaScript execution.
Context timeline. For any device or queue operation issued by the UA.
Timeline-agnostic. Catch all for when other timelines are not relevant.

WebNN is similar to WebGL's programming model in this aspect, WebGLRenderingContext is akin to MLContext in design, neither make the underlying queue or device visible to the web developer. WebGL does not interop with WebGPU but if it had, I suspect the timelines couldn't be 1:1 with WebGPU.

zolkis commented 4 months ago

Is it correct to say that the current standard prose on parallelism is enough to capture timelines?

To run steps in parallel means those steps are to be run, one after another, at the same time as other logic in the standard (e.g., at the same time as the event loop). This standard does not define the precise mechanism by which this is achieved, be it time-sharing cooperative multitasking, fibers, threads, processes, using different hyperthreads, cores, CPUs, machines, etc. By contrast, an operation that is to run immediately must interrupt the currently running task, run itself, and then resume the previously running task.

Do we need to define a specialized term for timelines?

If we do, we should also define the relationships vs context, graph etc.:

does a context support/encapsulate/control multiple timelines?
can a graph be executed on multiple timelines / multiple contexts?

From the app script point of view, what is the minimal differentiation of terms we need to do?

EDIT: I see that it comes from WebGPU timelines. Is it enough to refer to these definitions, or do we want to simplify / capture more nuances in Web NN?

bbernhar commented 4 months ago

Is it correct to say that the current standard prose on parallelism is enough to capture timelines?

Not fully. We still need to define what state gets exposed per API operation. For example, MLGraph has access to MLBuffer through the MLContext so they could all operate on the "context timeline".

Is it enough to refer to these definitions, or do we want to simplify / capture more nuances in Web NN?

WebNN could map to WebGPU timelines when the deviceType is GPU but not necessarily for the other device types.

webmachinelearning / webnn

Specify WebNN timelines #529