Review OffscreenCanvas, including ImageBitmapRenderingContext

junov commented 8 years ago

Requesting that the TAG review the ImageBitmapRenderingContext interface spec'ed here: https://html.spec.whatwg.org/multipage/scripting.html#the-imagebitmap-rendering-context

slightlyoff commented 8 years ago

Hey Justin,

Do you happen to have an explainer doc somewhere for this feature? Or sample code that shows the use-cases and how this API addresses them?

Regards

junov commented 8 years ago

Hi, the original proposal doc was part of the OffscreenCanvas feature proposal. That doc got cleaned-up after ImageBitmapRenderingContext landed in the whatwg standard (to avoid confusion with respect to landed vs unlanded parts of the spec). Here is the pre-cleanup revision of the OffscreenCanvas proposal, which includes ImageBitmapRenderingContext: https://wiki.whatwg.org/index.php?title=OffscreenCanvas&oldid=10087

junov commented 8 years ago

Here is a basic test that shows how the API is used: https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/fast/canvas/imagebitmap/transferFromImageBitmap.html

domenic commented 8 years ago

We'd be happy to have more examples and an introduction section (cf. https://html.spec.whatwg.org/#introduction-13) in the spec. Last I talked I think you were waiting to do that until OffscreenCanvas also got integrated into the spec, but maybe there's value in doing it separately as well.

junov commented 8 years ago

Right. I'll take care of that

dbaron commented 8 years ago

The TAG has been looking into this at our face-to-face meeting just now. It seems like it might be helpful to see a little more of an explainer, with examples. (The best explainer we know of right now is from https://github.com/w3ctag/spec-reviews/issues/141#issuecomment-257298066 .) We're also, at least so far, a little confused by the naming of some of the objects and methods here.

It's not clear to me how the ImageBitmapRenderingContext part of the proposal relates to the transferToImageBitmap()/commit() part of the proposal.

And there seems to be a good bit of concern about lack of things like requestAnimationFrame, and lack of ability to synchronize with audio/video.

So, again, it would be good to see some more end-to-end examples of how this is to be used.

junov commented 7 years ago

Added an intro with an example: https://github.com/whatwg/html/pull/2045

The example is a bit weak for now, there will be stronger examples once OffscreenCanvas is added.

It's not clear to me how the ImageBitmapRenderingContext part of the proposal relates to the transferToImageBitmap()/commit() part of the proposal.

OffscreenCanvas has two modes of operation: the ImageBitmap way, and the commit way. The idea is that you'd typically use one or the other (ImageBitmaps or commit) depending on the requirements of your use case.

The commit way requires no script intervention on the main thread (the browsing context's event loop). You just call commit and the results are pushed to the display, this path is the most performant, and allows implementations to take all sort of shortcuts to reduce graphics update overhead. However, the commit flow does not allow commited frames to be synchronized with graphics updates of the surrounding page, which can be a problem. The association between the OffscrenCanvas and its placeholder canvas element is established when the OffscreenCanvas is created by calling transferControlToOffscreen() on the placeholder canvas.

The other way to use OffscreenCanvas is to produce explicit frames by calling transferToImageBitmap. The resulting ImageBitmap object can then be postMessage'd back to the main event loop, where it can be taken in with ImageBitmapRenderingContext.transferFromImageBitmap(). Then the graphics update that reveals the new canvas content can be synchronized with other changes to the DOM.

And there seems to be a good bit of concern about lack of things like requestAnimationFrame, and lack of ability to synchronize with audio/video.

requestAnimationFrame is on the way.

Synchronization with audio/video cannot be done on a worker (without any major additions to the API) since those tags are not accessible in workers. However, by using the ImageBitmap flow (as opposed to commit), it would be possible to render in workers, and take care of synchronizing the graphics update with audio/video in the main event loop. But this realy is not something OffscreenCanvas is well suited for IMHO.

dbaron commented 7 years ago

Given #144, we're going to make this issue cover all of OffscreenCanvas.

dbaron commented 7 years ago

OK, we've just discussed this in a breakout at our face-to-face meeting.

I think we're more comfortable with the two different modes of operation now, and why both modes are valuable (avoiding (maybe) the UI thread for speed vs. letting the UI thread control when the update happens to synchronize with other changes), although it again took us a while to step through both of them and understand how they work. There was a little bit of concern about the number of objects involved (and types in general), although we don't have any suggestions for how to reduce that.

I think @travisleithead was a little concerned about why both ImageData and ImageBitmap exist, although that's water under the bridge at this point because it's already shipping.

I think we're pretty close to being able to close this issue, but curious first if you have any feedback on the above comments.

travisleithead commented 7 years ago

Some of our raw notes...

dbaron commented 7 years ago

@junov, curious if you have any feedback on https://github.com/w3ctag/design-reviews/issues/141#issuecomment-297916271 above -- particularly on whether there was any consideration of ways to reduce the number of objects involved here. It seems a little awkward, but we also didn't see any obvious alternatives.

travisleithead commented 7 years ago

Re-pinging @junov, and would also appreciate thoughts on the above from others in-the-know @domenic? @slightlyoff you mentioned finding someone to provide feedback as well?

junov commented 7 years ago

Hi sorry about the slow response time.

Regarding the need for both ImageData and ImageBitmap: From a pure API perspective, I agree that ImageData alone could be sufficient. The motivation for ImageBitmap is not to add functionality to the platform, it is purely a performance primitive. Because the object is opaque and read-only, implementations can do a lot of very significant optimizations. For example:

The pixel data is not required to reside in RAM. It may reside on the GPU, thus sparing costly GPU<->CPU transfers. With ImageData, this is not the case becase the pixel data must be mapped to a script-accessible typed ArrayBuffer.
Taking a snapshot of a canvas as an ImageBitmap does not require making an upfront copy of the pixel data. Instead, the ImageBitmap object can refer to the canvas's backing store until the time that the canvas's contents are modified (we've been calling this trick "copy on write"). This optimization saves memory as well as CPU time
No need for converting the pixel data. Because ImageData strictly requires RGBA layout, 8-bits per component and color values not premultiplied by alpha, there are usually conversions that are necessary in copying data to and from an ImageData. ImageBitmap can skip conversions by keeping the data in whatever format is convenient to the implementation.

Another advantage of ImageBitmap is asynchronous creation. Making image decoding, resizing and reformatting asynchronous (and therefore parallelizable) is useful for making apps that run smoother.

Regarding the ImageBitmapRenderingContext: it exists for the same reason as ImageBitmap. It adds no functionality to the platform. We could get by with using CanvasRenderingContext2D.drawImage to display an ImageBitmap. However, the 2D context requires making an additional copy of the image (to the canvas backing store). ImageBitmapRenderingContext provides transfer semantics, which avoid the data duplication (saves CPU time as well as RAM).

The use of transfer semantics is an answer to web developer requests for zero-copy mechanisms for handling pixel data. We've been told, for example, that it is extremely challenging to make mobile web apps that manipulates DSLR-resolution images because of memory bloat with the current APIs

Here is a talk I gave a couple weeks ago that may help put things in perspective. It is mostly about ImageBitmap and OffscreenCanvas and it provides a bit of use case context: https://www.youtube.com/watch?v=wkDd-x0EkFU

slightlyoff commented 7 years ago

Hey @junov:

We've looked at ImageBitmap and ImageBitmapCanvasData again and I'll admit that it continues to be difficult to evaluate this design without an explainer that outlines the cases you're trying to solve. I watched the talk and find most of the arguments here about efficiency compelling, but I'm unsure how the timing will work out in OffscreenCanavs cases as workers don't have access to rAF.

Overall, it seems like there should be a document that describes how these APIs fit together and are motivated by end-to-end needs. Without out that, it's hard to give these choices a clean bill of health.

Regards

torgo commented 7 years ago

Taken up again at Nice f2f. Still no explainer.

junov commented 7 years ago

The original proposal/explainer doc is here: https://wiki.whatwg.org/wiki/OffscreenCanvas An additional design proposal for handling animations is documented here: https://github.com/junov/OffscreenCanvasAnimation/blob/master/OffscreenCanvasAnimation.md The discussion thread for the animation proposal is here: https://discourse.wicg.io/t/offscreencanvas-animations-in-workers/1989/10

junov commented 7 years ago

Please let me know if there is anything unclear or missing information or explanations in these documents. I'll be glad to improve them.

torgo commented 7 years ago

Agreed to punt to next week (17-oct) and have a more detailed discussion with @junov if possible. Alex to follow up.

slightlyoff commented 7 years ago

Have sent an invite to the meeting to Justin. Will also try to discuss with him ahead of time as I'm unclear on some of these cases even after reading the linked threads.

slightlyoff commented 7 years ago

Per today's conversation, I want to bring in some other folks. cc: @toji, @NellWaliczek

The situation with requestAnimationFrame vs. VRSession::requestFrame vs. Offscreen Canvas' ctx.commit() method is a microcosm of the sort of API fragmentation we have been worried about for some time with WebVR and friends. We expect it to show up in other areas (input handling, audio processing & streaming, image decoding, etc.)

In particular, it seems like the fact that the parent document of the worker that receives the Offscreen Canvas implicitly sets the rAF rate to it's own rAF makes Offscreen Canvas unsuitable for WebVR (which seems pretty bad). Similarly, WebVR continues to not define a "high performance" mode for documents and for <iframe> isolation. This is predictably going to lead to many heuristics in engines about when to spin up high-power vs. low-power GPUs, main-thread throttling for various documents, etc.

This is compounded by the fact that WebVR is also introducing it's own (unaligned with the rest of the pipeline, e.g. input) rAF equivalent and separate model for doubling-up canvas write rates.

We have a chance to fix all of this before we're stuck in a bad (and getting worse) situation.

Questions!

To what extent is second-display (main-screen) a hard requirement for WebVR 2.0?
To what extent does WebVR want/need Offscreen Canvas support?
What is the proposed method today for WebVR keeping parent document main threads from interfering in iframe'd VR content?
How do engines decide to enter a "high-performance" mode for VR?

Thanks

junov commented 7 years ago

the parent document of the worker that receives the OffscreenCanvas implicitly sets the rAF rate to it's own rAF makes Offscreen Canvas unsuitable for WebVR (which seems pretty bad)

Not really. WebVR forces you to use it own requestFrame API, which controls the frame rate. This would also be the case with OffscreenCanvas.

To what extent does WebVR want/need OffscreenCanvas support?

It would be a killer combo, and at one point this was a part of the plan, but the current version of the WebVR spec has no dependencies on OffscreenCanvas and does not expose anything in Workers. No idea what happened there. I know there are vendors implementing WebVR who have not yet committed to implementing OffscreenCanvas, but I don't see why that would prevent the WebVR+OffscreenCanvas option from being planned.

toji commented 7 years ago

I can definitely see why looking at the WebVR API and seeing Yet Another rAF™️ would be worrying, but it's not a route that the group has pursued lightly. Allow me to try and break down the logic behind it:

First and foremost is the semi-obvious requirement that the headsets need to run at a different rate than the main monitor and a purpose-built rAF is one practical way to pursue that. Certainly you could also try and adjust the throttling of the page as a whole, but my gut impression is that suddenly speeding up every rAF-based operation on the page because an unrelated API was called is a bad thing. In fact, it seems to grind directly against the concerns about isolation that have come up here.

Second, VR is an area where latency matters quite a bit, and we've seen with the existing spec that we're trying to replace that there's a lot of ways for developers to accidentally make things worse for themselves when you're polling device poses independently of pumping the frame loop. As a result we've made the design decision to have our variant of rAF also be the mechanism that supplies the device tracking data. That way we can make stronger guarantees about the relationship between the pose we deliver and the frame that gets rendered in response.

This is also viewed as something of a security mechanism: We want to avoid a world where pages casually spin up highly accurate positional tracking in the background. Having a tight correlation between the render loop (for magic window in this case) and pose delivery ensures that we can do some basic checking around things like "You really should be rendering something in response to these poses or we're going to stop providing them." We can also easily correlate the frame loop with a specific output surface so that we can suspend it when the related element is scrolled off the screen which is not something that is practical with rAF. (Similarly in VR browsers we have several scenarios where the VR rAF needs to be suspended or throttled, say when using the VR controller to input a password, but we may still want to show the page itself at that time.)

So yes, it's something we've given a lot of thought to. Of course it would be ideal if there was the "one frame loop to rule them all" but I don't actually see that being practical when you have very specific needs like we do, especially given the relatively loose behavior of rAF as it's defined today.

To address a few other questions:

To what extent is second-display (main-screen) a hard requirement for WebVR 2.0?

Not at all for the most performance sensitive systems (mobile), and mildly-important for desktop. Mainly because it would be a little weird if the browser just froze or blanked out whenever you started looking at VR content. If we had to suppress main-screen rendering while in VR initially we could do that, but it doesn't look like there's much technical reason to do so aside from performance concerns (which I'll talk about in a bit.)

To what extent does WebVR want/need Offscreen Canvas support?

We don't need it, but we definitely want Offscreen Canvas to be a first class citizen with WebVR! The assumption has always been that we would be able to use it, and I can see multiple cases where it would be useful. I should note that similar to when using a normal canvas the intent would be for WebVR to still use it's own rAF when using Offscreen Canvas for all the reasons given above.

What is the proposed method today for WebVR keeping parent document main threads from interfering in iframe'd VR content?

We are still discussing this as a group, especially after the TAG review. I haven't been viewing is as a critical "must solve prior to launch" problem, though. The API itself is being designed to be self contained with minimal dependencies on the DOM, mostly so it can function in workers, but also so that if we decide an isolated environment is beneficial it can work there easily. My primary concern with that type of environment, though, is that isolating WebVR from the DOM is really the easiest part of the problem. How to handle mouse/keyboard input or things like video playback in that kind of environment strike me as far harder issues.

I know that it's been proposed that we could spec out a specialized "meta-document" environment that gives you a performance-isolated place to play in, which sounds cool, but I would expect that WebVR would largely "just work" is such an environment and that it's specification is something that a much larger group than just the WebVR community group will want a hand in.

How do engines decide to enter a "high-performance" mode for VR?

There's a couple of things you could be referring to here, and I'm not sure which. With mobile VR there is a "sustained performance mode" that almost all apps use, which is explicitly not high performance. Instead it's focused on running the device at a lower performance level that provides stronger guarantees about not being thermally throttled. This is something that kicks in automatically when pages begin presenting VR content today and the plan is to continue doing so.

There's reasons why apps may want to opt out of that mode, sometimes temporarily (such as to speed up loads), but I don't see that as critical to expose to the web at this point.

You may also be referring to how to trigger the appropriate GPUs in multi-GPU systems. This is actually addressed in the explainer. (Look for setCompatibleVRDevice)

Finally you may be asking how to enter the theoretical "performance isolation" mode talked about in the previous questions answer, in which case I'd repeat that while we're discussing it we don't have solid plans at this point.

I'm super happy to discuss all of this to see if there's better solutions to be had, but I'm also wary of getting into a situation where forward progress on the WebVR spec and implementation is blocked on something like chasing an idealized uber-rAF.

juj commented 7 years ago

Let me rehash some of the use cases/needs from compiled GL code perspective (Emscripten, WebAssembly, Unity, Unreal Engine 4, ... crowds). I believe these are very much the same needs as WebVR applications have, since the same crowds implement VR support, and both development cases seek after the highest performance in rendering.

Control loops:

Needing to refactor C/C++ code to run event-based rather than being able to maintain own control loops is the single biggest blocker to improving portability at the moment. The term control loop is preferred here instead of the expression main loop, since the latter occassionally creates an illusion that applications would be structured something like a "main" top level int main() { for(;;) tick(); } loop form. Those types of applications are trivial to asynchronify, and are not an issue.

The issue is that native codebases can have multiple different control loops, several nested control loops, or even if it's just one, it can be deeply nested in a call stack, and refactoring the whole application to run asynchronously event based is something that is often too difficult to do. Experience shows that in the cases that developers have been successful in asynchronifying the codebases, this can touch so many locations in code that upstream project no longer wants to take the modifications in, and the effort ends up becoming a bitrotting experimental proof of concept work. Off the top of my head, this has happened for example to Qt, wxWidgets, ScummVM, DosBox, Mame & Mess projects, to name a few. That is why Emscripten is looking to enable a model where one can run code in a Worker and allow code to retain their own control loops unmodified, never yielding back to the browser in that Worker. This will greatly improve the portability of how much code can be compiled to the web.

One thing this prevents is the receival of postMessage()s and other events in that Worker that is spinning its own control loops. For those scenarios, we have a SharedArrayBuffer-based event queue for each Worker, which the application can then synchronously post and receive its web events in.

It should be stressed here is that the intent is not to fix all Emscripten WebAssembly-compiled applications to always run in such a model, but Emscripten enables both types of computation models (async event-based in main thread or Worker, and sync control loops in Worker), so that code can use whichever they see more suited for the codebase in question.

On the surface, it might seem that the async-await keywords would enable one to run synchronous control loops, if there was a Promise variant of rAF, but that method does not quite work, the computation model that async-await delivers is subtly different than is needed here. This has been discussed in https://github.com/junov/OffscreenCanvasAnimation/issues/1.

yield = swap:

The "yielding back from event handler is an implicit WebGL swap" model is not suitable for applications that do their own control loops in a Web Worker. That is why the explicit .commit() call would be useful for Workers that utilize OffscreenCanvas; that would enable those applications to present a frame from the Worker using a mechanism that does not require them to yield. Other applications use rendering models that are not based on interactive animations, and they might not be rendering as a response to an external event, but they might be doing some computation, after which they'll present the produced results, then they'll compute some more, and then swap again to present. Scientific applications and loading screens can be like that - they don't have a 1:1 correspondence of an 1 event=1 swap, or 1 turn of a control loop=1 swap, but they are structured to present after some piece of computation (that is run sequentially/imperatively) finishes.

Currently in Emscripten we do all rendering to a separate offscreen FBO for the above types of applications, and then offer an explicit swap function for these apps, which blits the offscreen FBO to the screen to be visible. This is inefficient, but works, with the caveat that presentation is still limited to whatever the composition interval of the browser is, e.g. swapping more often than what the rAF() composition rate is will lead to discarded frames that are never shown to the user, which is not ideal.

Having an imperative swap function would also be useful for portability, because that is exactly the model that most other platforms have - there are WGL_swap_buffers, GLX_swap_buffers, EGL_swap_buffers, D3D present etc. functionality that allows one to explicitly say when to .commit(). Being able to provide the same functionality is great for retaining an unified codebase structure across platforms. Otherwise applications will need to start auditing their GL rendering patterns, and identify how draw calls relate to swapping, and make sure to refactor so that they are able to render everything in exactly one web event callback (or use the FBO fallback, impacting performance). This might not sound too hard if you are the first party developer of the codebase in question, but often it happens that the developers retargeting projects to the web are different from the people who originally wrote the software, which means that developers can be working on porting codebases they know relatively little about. This fact is often underappreciated, and developers working in such a situation may get mislabeled as amateur, since the "perfect knowledge and control" of code is regarded a hallmark of an expert developer. Decoupling control flow from the decision of when to present would bring flexibility via orthogonality, as these two things are fundamentally two unrelated programming concepts. As result, developers would not need to pay attention to technicalities that implicit swap behavior has, and more code would be possible to support out of the box without consuming productivity cycles on.

vsync rate:

There is a combination from a number of items in play:

a) in some browsers, rAF() is hardcoded to run 1:1 with display's vsync rate, b) in other browsers, rAF() is hardcoded to run at 60Hz, c) the rAF() rate may not be a constant with respect to for example page lifetime, but can vary at runtime, e.g. in multimonitor setup when one moves the browser window over to another display that has a different hz rate, d) there is no API to ask what the current rAF() presentation rate is.

In order to reduce rendering microstutter, a behavior that is hated with passion and sometimes creates strong ill emotions in gaming audiences, applications commonly want to lock their animation update timings to vsync. That is, instead of updating animation via variable timesteps measured via timed performance.now() deltas, which returns jittering measures, applications take performance.now() measures and fix to round them to nearest elapsed multiple of 1000/refresh_rate_Hz msecs, when they know that the frames will be presented with such quantas when presentation is locked to vsync.

For example, if an application knows that its presentation is locked against a 60Hz rate, then it generally desires to do fixed 16.667ms slices of animation updates, rather than applying dynamic length steps that are computed from elapsed times via performance.now() since last update. In this model, one generally uses performance.now() to estimate when full vsync intervals have been missed, and e.g. a performance.now() delta of, say, 28ms since previous update, would mean that the app will want to take 2x 16.6667ms update slices to align to the arrival time of the next vsync window. However this kind of computation requires knowing what the exact rAF() vsync rate of the current display is.

Other times, applications may want to update at a lower, or at a specific controlled refresh rate. For example, a heavy game application (or if low battery is detected) might want to cap rendering to 30Hz, independent of if running on a 60Hz or a 120Hz display. Or a video application may want to update at a rate that is closest to 24Hz, by detecting what the closest such possible presentation rate might be, and then computing what the needed pulldown/pullup algorithm will be, e.g. to align a source 24Hz video stream to the actual presentation rate.

Since there is no API to query what the rAF()/vsync rate is, one will currently need to benchmark it. But in order to benchmark it, one cannot do much rendering during benchmarking, because too heavy rendering would cause one to miss vsync intervals, resulting in noisy/incorrect benchmark estimates. Because of c) above (an effect that is definitely desired), one cannot just measure the rAF() rate once at page load time, but will need to occassionally keep remeasuring in case the rAF() rate might have changed.

So because rAF() rate can change, and measuring rAF() rate is an activity that prevents actual rendering, this activity becomes a type of exploration-vs-exploitation problem. One will need to explore what the rAF rate is at suitable times, but at the same time, one wants to maximize the time to actually present at that rate, leading to a heuristic juggling of when to re-benchmark the rAF rate while pausing rendering.

To get rid of all of the above, it would be great to have an explicit API to ask what the current refresh rate is of the rAF()/other presentation mechanism is, and have that be an API that one can keep referring to, to be on the look out for if/when it changes. something like canvas.verticalSyncRate property (get the vsync rate of the display that the current canvas is on), or something effectively similar that could be multimonitor aware.

Rendering decoupled from vsync:

Sometimes to minimize latency, one wants to disable vsync, and push frames as fast as possible. Other times, one would like to utilize adaptive sync, FreeSync or GSync, which offer more advanced vsync control. For these cases the explicit .commit() function would fit well, because it would naturally scale to pushing frames as fast as possible, and with minimal latency.

Rendering without vsync enabled is desirable mostly in fullscreen presentation modes. In windowed mode, browser has to be aware to composit with other page contents, where I understand it might not be possible to composit other page content with vsync, but just present the canvas window without waiting for vsync. Nevertheless, it would be good if the API for presenting without vsync was decoupled from fullscreen presentation, since perhaps some platforms might be able to do that, and having to exit fullscreen and re-enter if one wanted to change vsync on/off would be poor UX. In native world, the vsync synchronization choice comes as something that can be done for every single present separately - there are no inherent "mode changes" for the display or GPU involved or otherwise, so preserving something similar would be nice.

Rendering by setting a custom vsync rate:

Expanding on what was touched on in above, applications commonly would like to specify what the used vsync presentation interval is. This allows application to scale resources appropriately, and avoid rendering too often, or opt in to more frequent rendering. There are two ways that applications want to control vsync: I) by setting the vsync rate from the list of supported rates by the display, and/or II) by applying a decimation factor (1/2, 1/3, 1/4, ...) to a specified vsync rate.

Some of the example cases for these needs were referred to above: a source animation (video) that was authored at 24Hz might want to configure the vsync rate to be 120Hz, with a decimation factor of 1/5, if the display supported 120Hz, and if not, then set to 60Hz and do 3:2 pulldown.

The method I) can be fundamentally incompatible with other web page compositing, so I) would be best restricted to Fullscreen API, and VR display presentation API, where a given canvas is the only fullscreen element on a particular display device. Method II) can be implemented in native applications by specifying a swap interval to native swap/present calls, and a similar item could be imaginable to exist in .commit({swapInterval: 4}) or rAF({swapInterval: 4}) calls. To parallel the native world, perhaps a call .commit({swapInterval: 0}) might present without waiting for vsync (no sleeping), and .commit({swapInterval: 1}) could present with 1:1 vsync (sleep/block until new buffer is free), and .commit({swapInterval: 2}) could present with 1:2 vsync (decimated in half).

My understanding is that needing to operate on a custom presentation rate is what led to WebVR API proposing their own rAF()-like machinery. It would indeed be great to have a symmetric API for all of this, e.g. by allowing requestFullscreen() to customize which display to take the target element fullscreen on (current browser display vs VR display), while setting the vsync rate when performing the fullscreen request. The vsync rate decimation could then be paired with a .commit({swapInterval: 4}) or rAF({swapInterval: 4}) API. One API trouble there is that currently requestFullscreen API is hardcoded to allow only exactly one fullscreen activity at a time, whereas with multiple displays and VR displays, one might want to go fullscreen on two displays simultaneously (different canvas on each).

The above aspect is important especially for VR, since desktop VR applications have been going to the direction that what the headset renders is not a mirrored copy of what the desktop display shows, but one might desire to render a non-ocular-warped regular 3D view of the scene for other observers to enjoy, and some 2D control UI that is not visible in the headset display itself.

In summary, there are a few different scenarios, and some of above don't specifically relate to .commit(), and some definitely won't get resolved in the scope of OffscreenCanvas, but just thought to do a bit more thorough writeup to illustrate the types of hurdles that people targeting the web from native codebases currently have around this topic. We can (and do) emulate and work around a lot, but that has various drawbacks in performance and corner cases. The offscreen FBO hack can be used even if OffscreenCanvas did not have a .commit(), though by taking a fillrate hit.

toji commented 7 years ago

Apologies for the uber-post I'm dropping in here!

TL;DR: I'm proposing that requestAnimationFrame and cancelAnimationFrame be abstracted into an interface which window implements, and which can subsequently be implemented by any other object that needs to surface a display cadence. This is to formalize how rAF-like functionality is exposed to the web and prevent multiple similar but incompatible interfaces from emerging.

After joining the TAG call on Tuesday and talking with Alex Russell separately later that day, I think that at the very least he's got a better understanding of how the WebVR community group arrived at the interface it did in our explainer. Key to that clarification seemed to be highlighting the fact that we're using our rAF variant to not only control timing but deliver pose data in sync with those animation frames. Additionally in WebVR's case we also intend to deliver VR controller updates in sync with those animation frames to enable smooth tracking.

Given that understanding it seems like the primary concern on Alex's part became preventing the web from growing multiple similar but incompatible rAF-like interfaces. There are still concerns around having multiple loops running at different speeds, but that seems semi-unavoidable and not as big of a concern in the long run?

So with that in mind I talked through the issue with some other colleages, we came up with an approach that could potentially pave the way for new rAF-style interfaces. I'll sketch out some rough IDL first and then go into more detail:

// Standard Windows rAF

callback FrameRequestCallback = void (DOMHighResTimeStamp time, FrameRequestData frameData);

interface FrameRequestData {
  // Not clear what would be useful here.
}

interface AnimationFrameProvider {
  unsigned long requestAnimationFrame(FrameRequestCallback callback);
  void cancelAnimationFrame(unsigned long handle);
}

Window implements AnimationFrameProvider;

// WebVR rAF variant

VRSession implements AnimationFrameProvider;

// This would replace the current VRPresentationFrame in the WebVR Explainer
interface VRFrameRequestData : FrameRequestData {
  readonly attribute VRSession session;
  readonly attribute FrozenArray<VRView> views;

  VRDevicePose? getDevicePose(VRCoordinateSystem coordinateSystem);
}

// For Video

HTMLVideoElement implememnts AnimationFrameProvider;

interface HTMLVideoElementFrameRequestData : FrameRequestData {
  // Useful to report some playback state here? (Already on element)
  readonly attribute double currentTime;
  readonly attribute unsigned long videoWidth;
  readonly attribute unsigned long videoHeight;
}

// For rAF in Workers

partial interface Window {
  // Terrible name alert! Ideally something more palatable.
  TranferrableAnimationFrameProvider getTransferrableAnimationFrameProvider();
}

[Transferrable]
interface TranferrableAnimationFrameProvider : AnimationFrameProvider {
  // Anything else useful/necessary here?
}

The first thing to note is that this approach maintains compatibility with existing window.requestAnimationFrame semantics, extending it in a way that should be invisible to existing pages.

For an interface that wants to then expose a rAF loop that runs at a different rate than the document rAF, such as a WebVR session, it could implement the same interface but provide a custom data structure to the callback. This would enable things like WebVR's desire to expose device pose data in sync with the frame loop. Usage in WebVR would look like so:

function onDrawFrame(time, vrFrameData) {
  let pose = vrFrameData.getDevicePose(vrFrameOfRef);
  gl.bindFramebuffer(vrSession.baseLayer.framebuffer);

  for (let view in vrFrameData.views) {
    let viewport = view.getViewport(vrSession.baseLayer);
    gl.viewport(viewport.x, viewport.y, viewport.width, viewport.height);
    drawScene(view, pose);
  }

  // Request the next VR callback
  vrSession.requestAnimationFrame(onDrawFrame);
}

vrSession.requestAnimationFrame(onDrawFrame);

This is actually almost exactly what the explainer already shows with a couple of tweaks: The rAF function name is now requestAnimationFrame instead of requestFrame as the explainer proposes and the callback now provides a timestamp along with the VR frame data.

Another potential use for this pattern that's not well served today and other teams are trying to reason around: Enabling videos to be do processing as new frames are decoded rather than simply repeatedly uploading them each rAF as, for example, most WebGL video apps do now. Pretty much any WebGL-based video playback today does something like this:

function drawFrame(time) {
  window.requestAnimationFrame(drawFrame);

  // Update video texture
  gl.bindTexture(gl.TEXTURE_2D, videoTex);
  gl.texImage2D(gl.TEXTURE_2D, ..., videoElement);

  // Other GL setup ...

  // Draw video mesh
  gl.drawArrays(gl.TRIANGLES, 0, 6);
}

window.requestAnimationFrame(drawFrame);

Which is problematic because the video may only update at 24Hz-30Hz, which means we're wasting work asking for the texture copy each frame here. But under the above rAF proposal it could become:

function drawFrame(time) {
  window.requestAnimationFrame(drawFrame);

  // Other GL setup...

  // Draw video mesh
  gl.drawArrays(gl.TRIANGLES, 0, 6);
}

window.requestAnimationFrame(drawFrame);

function copyVideoFrame(time) {
  videoElement.requestAnimationFrame(copyVideoFrame);

  // Update video texture
  gl.bindTexture(gl.TEXTURE_2D, videoTex);
  gl.texImage2D(gl.TEXTURE_2D, ..., videoElement);
}

videoElement.requestAnimationFrame(copyVideoFrame);

This reduces the texture copy to the actual video framerate and creates (in my opinion) a cleaner separation of concerns. Of course, video is complicated and so it's not 100% clear to me that we could get the latching behavior we want out of this but quick polls of coworkers make it sound feasible.

It's worth noting that there's a WebGL extension, WEBGL_video_texture, that's also attempting to tackle this (as well as lowering the total texture copy cost). But talking with one of our WebGL devs it sounds like this rAF proposal might actually serve that need better?

Finally, for the case of Offscreen Canavas in a worker we could create a transferrable implementation of the interface that would likely be produced by the window. This creates a clear connection between the two and communicates well exactly what the worker rAF will be aligned to. This does NOT provide the nice while(await) pattern that has been discussed by the Offscreen Canvas team, but it seems like it would be trivial to write a promise-emitting wrapper around the rAF callbacks if that's needed?

Anyway, I'm sure there's quirks to work out here but I wanted to get this up to start a conversation about if this moves us in a positive direction. I'll say that from the WebVR perspective I think we could easily accomodate this type of model, with the primary concern being that we don't want to get stuck in spec limbo if coming to an agreement on a change like this is going to take another 6+ months.

Thoughts?

junov commented 6 years ago

@toji the API you propose does not address the use case brought forward by @juj. That is to say the case of porting apps to the Web via WebAssambly which were not developed using an async programming model. We need to decide whether or not this is a use case that the Web Platform should support. Not supporting this means emscripten will have to use a gpu command protocol (implemented on top of SharedArrayBuffer, I presume) in order to relay WebGL commands to a separate Worker. This is not great for performance, but it is a possible workaround.

@juj do you have performance data regarding the perf impact of relaying WebGL Commands?

junov commented 6 years ago

I would like to propose a solution for the problem exposed by juj@ that would work in a world where all we have for driving animations is a rAF API. It uses two workers. Let's call them, mainWorker and presentationWorker. mainWorker is where the application's never-ending ui loop runs. On mainWorker we have an OffscreenCanvas object that is used for preparing frames, let's call it backBuffer. backBuffer is created directly using the OffscreenCanvas constructor (i.e. it is not associated with a placeholder canvas element). On the other hand, presentationWorker has an OffscreenCanvas object that is associated with a placeholder canvas. Let's call that one frontBuffer. When mainWorker wants to commit a frame, it would do something like this:

let frame = backBuffer.transferToImageBitmap();
presentationWorkerMessagePort.postMessage({frame: frame}, [frame]);

On the presentation worker side, the message handler would receive the frame and simply push it to frontBuffer. For this to be as streamlined as possible, we should expose ImageBitmapRenderingContext in workers, which is a trivial change. The only reason it is not currently exposed in workers is because there was no use case for it... until now. Alternately, the role of presentationWorker could be implemented on the main thread, but it is nice to have it in a worker which allows frames to be continuously pushed to the display without delay even when the main thread is busy.

To implement vsync throttling behavior, presentationWorker could run a requestAnimationFrame loop that signals frame barriers to 'mainWorker' via a semaphore implemented using SharedArrayBuffer. On the mainWorker side, the throttling would be implemented using a call to Atomics.wait() on the frame barrier semaphore.

I think this is a more reasonable solution than forwarding WebGL calls. ImageBitmap objects are transferable, so there is very little overhead in serializing them for postMessage. Implementations can wrap GPU textures inside ImageBitmap objects. Also, postMessage is required by the spec to be immediate, so it will work fine to call it from a never-ending task.

@juj WDYT?

juj commented 6 years ago

Key to that clarification seemed to be highlighting the fact that we're using our rAF variant to not only control timing but deliver pose data in sync with those animation frames.

Do all headsets have this kind of lockstep relation between pose data updates and display refresh rate? Is that desirable?

I'd imagine a typical structure of a variable timestep rAF() body for VR could look like

var t0;
void rAFTick() {
  var t1 = performance.now();
  var dt = t1 - t0;
  updateSceneSimulation(dt); // Physics, game logic, etc. "camera independent", could take several msecs
  t0 = t1;
  var cameraPose = getHMDPose();
  renderScene(cameraPose);
  requestAnimationFrame(rAFTick);
}

If delivering the camera pose is tied to the firing of rAFTick, I understand the code looks something like this?

var t0;
void rAFTick(cameraPose) {
  var t1 = performance.now();
  var dt = t1 - t0;
  updateSceneSimulation(dt);
  t0 = t1;
  renderScene(cameraPose);
  requestAnimationFrame(rAFTick);
}

Is that structurally accurate? If so, the second example looks like it could have worse latency compared to the first one, in a scenario where getHMDPose(); might have an opportunity to grab more fresh live data? In order to ensure the same with the second style of API, one should reverse the order of updating and rendering(?), i.e.

var t0;
void rAFTick(cameraPose) {
  renderScene(cameraPose); // Render first so that cameraPose has least time to go stale
  var t1 = performance.now();
  var dt = t1 - t0;
  updateSceneSimulation(dt);
  t0 = t1;
  requestAnimationFrame(rAFTick);
}

If this is intended, it would be good to document this "reversal". In general I think I would favor not glueing pose delivery and rAF() together, they feel like two separate concepts. Is there a specific reason that getting pose data could not be a good old regular function call, like a VRDevice.captureLatestDevicePose()? That way it would have the advantage of being able to be called outside any rAF() loops so that one is not restricted to be running in a rAF loop. (the Worker has it own sync main loop scenario)

juj commented 6 years ago

Overall, I do like @toji's idea of having multiple rAF() providers as a mechanism to tie to multiple different refresh rates. That would map to a scenario where there are multiple canvases that are on different displays, with different refresh rates.

In addition to that, I feel we do really need an API that allows one to query what the currently occurring refresh rate on the particular display one is rAFfing at is, in a manner that allows polling if it can change (moving browser from one display to another). The above code examples are the most common ways to render that probably 99% of WebGL pages use, but that kind of code generates horrible microstuttering that WebGL applications are currently experiencing. On small WebGL canvases one might not see this, but on larger displays or on a HMD glued to your face, the stuttering becomes much more apparent. To remedy this, one should lock dts to refresh-rate fixed increments. Then a rAF tick would look like this:

var t0;
void rAFTick() {
  var t1 = performance.now();
  var refreshInterval = 1000 / 60; // Or rather, 1000 / display.getRefreshRate();
  var dt = t1 - t0;
  var threshold = 0.5;
  var fixedDt = Math.ceil(dt - threshold) / refreshInterval) * refreshInterval;
  updateSceneSimulation(fixedDt);
  t0 = t1;
  var cameraPose = getHMDPose();
  renderScene(cameraPose);
  requestAnimationFrame(rAFTick);
}

But currently since there is no API display.getRefreshRate() to ask what the hardware vsync rate is, the above kind of code is brittle since one has to keep benchmarking to discover the refresh rate (which is a bit futile like mentioned above)

This method relies on the knowledge that content is presented exactly on multiples of the vsync refresh interval. On a GSync/FreeSync display, or when rendering in VSync unlocked mode, this kind of code pattern would not be used.

An API to ask the refresh rate would also be ideal to be detached from control flow, i.e. an "imperative" function such as display.getRefreshRate() would be preferred over something that is piggybacked on requestAnimationFrame, since not all applications would like to use requestAnimationFrame.

juj commented 6 years ago

This does NOT provide the nice while(await) pattern that has been discussed by the Offscreen Canvas team, but it seems like it would be trivial to write a promise-emitting wrapper around the rAF callbacks if that's needed?

I'm still not sure if the while(await) thing is what people want even. It does not seem to solve real problems, except in example cases to make a couple of lines (that weren't a problem to begin with(?)) look pretty - in more complex cases it runs into the same problems of having to sync->async transform a codebase in a rippling difficult-to-refactor fashion (https://github.com/junov/OffscreenCanvasAnimation/issues/1). This is not to say anything negative about it, making code snippets look simpler and cleaner is nice and has value, but just that I'm looking at while(await) through the lens of problems I'm exposed to solving.

@juj do you have performance data regarding the perf impact of relaying WebGL Commands?

I am able to compile and run simple test GL apps at the moment, but still running into some codegen bugs for full blown WebGL 2 content. This is currently in progress, and I'm hoping to do comparative benchmarks as soon as I get Wasm-based multithreading landed. We do implement both relayed (proxied in Emscripten's parlance) WebGL mode and a non-relayed mode using OffscreenCanvas, so we will have both abilities to be able to compare apples to apples.

I would like to propose a solution for the problem exposed by juj@ that would work in a world where all we have for driving animations is a rAF API. It uses two workers.

From my test code that uses main browser thread to schedule rAF pings over to Web Worker, and a Web Worker that synchronously renders a small test cube, I find that frame rates are fluctuating quite bad in this kind of model, and my current estimate is that this is due to timing variances from processing these events. Gut estimate here says that making per-frame rendering or .commit() timings tie in to the event queues of Workers will be already a lost cause from latency and variance perspective, because of slowness coming from threads going to sleep and waking up to process those events. A Worker doing synchronous .commit() in OffscreenCanvas gives much more predictable frame rates in this kind of model However these tests were run with asm.js based implementation of multithreading, and I'm looking to do this kind of testing proper after I am finished with migrating to Wasm multithreading.

Relaying/proxying commands via SAB seems to have good performance, as long as one can do that asynchronously and by "filling a full pipe" of work to be done in a sequence so that processing doesn't starve or stall.

Preliminarily, I think that syncing to vsync in a Worker would be most ideally done using a mechanism that does not require intervention from other Workers or from the main thread.

toji commented 6 years ago

Do all headsets have this kind of lockstep relation between pose data updates and display refresh rate? Is that desirable?

Not all VR APIs have that requirement, but it's also not and uncommon pattern. In native land it typically looks like waitForNextPose(&posePtr) which serves both as the frame throttle and the pose update. And yes, in our case the least latent variant would be your third code sample, where rendering happens before non-pose-dependent simulation logic for the frame.

I feel we do really need an API that allows one to query what the currently occurring refresh rate on the particular display one is rAFfing at is, in a manner that allows polling if it can change

Totally agree with this, though recent hardware trends make it trickier. Variable rate displays like GSync monitors or the iPad's 120Hz screen feel like they could become a lot more common in the not so distant future if for no other reason than battery savings. So we'd likely want a way both to determine what the current refresh is with the expectation that it may change very frequently (I don't know if an event is appropriate?) but also have a way for developers to specify that they want their content to run at a locked rate that may be less than the screen's fastest.

I'm still not sure if the while(await) thing is what people want even.

I wondered about that myself, as it's syntactic sugar more than anything else.

junov commented 6 years ago

Preliminarily, I think that syncing to vsync in a Worker would be most ideally done using a mechanism that does not require intervention from other Workers or from the main thread.

I am a bit skeptical about your argument, at least I don't think it is that obvious that relaying WebGL commands would be better. Today, at least in Chrome's implementation, the vsync signal already has to hop over several cross-thread and even cross-process channels before it reaches commit()/rAF(). Adding a simple semaphore to that chain is probably inconsequential IMHO. Of course, if you're relaying rAF from the main thread, things can get ugly because rAF on main can be delayed for all sorts of reasons, but rAF on a side-car Worker that has nothing else to do should be relatively clean unless the system has high CPU core contention (maybe that is a key issue, maybe not). The propagation delay from postMessage for transferring the ImageBitmap to the presentationWorker is a couple hundred microseconds, but that delay is not likely to be on the critical path. It happens in parallel with the async rasterization of the WebGL frame's content. I guess the only way we'll know for sure whether relayed WebGL commands is better than relayed vsync+ImageBitmap will be try them out.

On my side, I'll try to run some experiments to compare direct commit() vs. relayed vsync

torgo commented 6 years ago

Taken up at London f2f - we discussed and looked like minimal value we can add at this point. We'll continue to monitor and come back to it on 2-20.

cynthia commented 6 years ago

Seems like the proposal has changed: https://github.com/junov/OffscreenCanvasAnimation/blob/master/OffscreenCanvasAnimation.md

torgo commented 6 years ago

Discussed at Tokyo f2f. We agreed we need to review the new proposal and return to this.

greggman commented 6 years ago

I'm not sure if this belongs here but I don't know where else to post it sooo...

I'm hoping OffscreenCanvas allows one WebGL context to be used to efficiently update multiple canvases and it's not clear to me how that is solved in the current proposal or if it's even supposed to.

MDN lists code like this as the way to draw to multiple canvases

var one = document.getElementById("one").getContext("bitmaprenderer"); 
var two = document.getElementById("two").getContext("bitmaprenderer");

var offscreen = new OffscreenCanvas(256, 256);
var gl = offscreen.getContext('webgl');

// ... some drawing for the first canvas using the gl context ...

// Commit rendering to the first canvas
var bitmapOne = offscreen.transferToImageBitmap();
one.transferImageBitmap(bitmapOne);

// ... some more drawing for the second canvas using the gl context ...

// Commit rendering to the second canvas 
var bitmapTwo = offscreen.transferToImageBitmap();
two.transferImageBitmap(bitmapTwo);

But that seems likely to be super inefficient unless I'm missing something.

In order to be able to draw to multiple canvases following the MDN style API you end up needing to set the size your rendering to on each switch. In other world you'd have to do this

offscreen.width = widthOfOne;       // EXPENSIVE
offscreen.height = heightOfOne;    // EXPENSIVE
renderSceneForOne();
var bitmapOne = offscreen.transferToImageBitmap();
one.transferImageBitmap(bitmapOne);

offscreen.width = widthOfTwo;       // EXPENSIVE
offscreen.height = heightOfTwo;    // EXPENSIVE
renderSceneForTwo();
var bitmapTwo = offscreen.transferToImageBitmap();
two.transferImageBitmap(bitmapTwo);

Those are expensive because you're reallocating the backbuffer once per bitmap per frame.

It seems like what you really want is each bitmaprenderer to keep 2 drawing buffers (like WebGL canvas does) and then attach the offscreen context to the drawingBuffer of any bitmaprenderer. That way there is no allocation. Each bitmap renderer has it's 2 buffers, the buffer being composited and the buffer being drawn to and you just need a way to attach context to that bitmaprenderer's drawingbuffer. (which internally is effectively just a call to gl.bindFramebuffer

Am I missing something? It seems like the current API isn't really designed to be used efficiently with multiple canvases.

kenrussell commented 6 years ago

I think that the browser will be able to optimize the recycling of OffscreenCanvases' backing stores enough to watch for one OffscreenCanvas being used to repeatedly produce ImageBitmaps of a few different sizes. In the scenario described, it seems important to continue to use transferred ImageBitmaps as the communication mechanism between the OffscreenCanvas and the ImageBitmapRenderingContext, especially when the frames are being produced on a worker thread and consumed on the browser's main thread. Any more implicit linkup between the OffscreenCanvas on the worker, and multiple ImageBitmapRenderingContexts on the main thread, seems problematic to me.

I think we should get the current proposal implemented and gain some experience from it, and then use that experience to drive the direction of the API further.

greggman commented 6 years ago

In that case you shouldn't you get rid of transferControlToOffscreen and commit? Why have 2 ways to render from a worker? One using transferToBitmapImage and another using commit? I assume commit is there because transferToBitmapImage is not efficient. Which kind of seems like it points out the issue? Why are there 2 ways to do this?

Might I suggest there should be only one way or at least if there are 2 ways they should both work for both use cases?

For example if a WebGL context could be bound to an OffscreenCanvas then you could do

<canvas id="c1" width="400" height="200"></canvas>
<canvas id="c2" width="300" height="400"></canvas>

const offscreen1 = document.querySelector("#c1").transforControllToOffscreen();
const offscreen2 = document.querySelector("#c2").transforControllToOffscreen();
const gl = offscreen1.getContext("webgl");

gl.setTargetCanvas(offscreen1);
...draw scene 1...
gl.commit();
gl.setTargetCanvas(offscreen2);
...draw scene 2...
gl.commit();

This seems like it would have a bunch of beneifts

It would pull the 2 ways of doing things into 1 or at least make doing this both ways possible instead of the inconsistent way it is now.
It would also mean no need to set the size when switching which canvas you're planning to target
It would also mean drawing to multiple canvases doesn't require transferring to imagebitmaps and then transfering those imagebitmaps from a worker back to the main thread so it's much simpler to use.
It seems like it doesn't require nearly as much smarts or special algorithms for the browser to guess/infer how to keep around old drawingbuffers as it becomes clear what's supposed to happen.
It also seems conceptually simpler to implement. Behind the scenes all that happens is the framebuffer that the WebGL implementation was considering the null framebuffer binding to gets set to the one owned by the new target canvas.
Beacuse of 4 above devs will be less likely to have to pray the browser handles the allocation issues of mutiple canvases efficently that was mentioned above.

Is there a reason why this is a bad idea?

kenrussell commented 6 years ago

These two specific use cases informed the design of the current APIs:

1) Rendering from a worker thread and having those results be composited with other DOM updates at a known time, by the user's code. OffscreenCanvas combined with transferable ImageBitmaps and ImageBitmapRenderingContext supports this use. 2) Allowing a worker to produce frames for display, not synchronized with other DOM updates, and with the lowest latency.

Experiments by @juj in the past showed that (1) carries too much overhead for Emscripten-ported games, but (1) is still needed for users where the worker's rendering has to be synchronized with the main thread's rendering.

commit() solves situation (2). However, synchronizing it with rendering updates by the main thread is difficult, and will add more latency.

greggman commented 6 years ago

So then that just confirmed the issue I bought up. If I want the lowest latency I need (2) but I can't use (2) as currently designed for multiple canvases and a single context.

kenrussell commented 6 years ago

The current thinking is that the performance and latency of transferToImageBitmap / transferFromImageBitmap will be OK for multithreaded content which absolutely has to perform updates in sync with the DOM.

We want to gain experience with both this and the commit()-based rendering style which is fully decoupled from DOM updates. I don't see a good way to change commit() to make it optionally, implicitly, sync up with DOM updates on the main thread.

greggman commented 6 years ago

I feel like the problem with the current APIs are they ignore what's really happening under the hood. That canvas are just pairs of textures attached to framebuffers. WebGL just followed the path of least resistance and though of canvases like OS windows. Parts of if it were not actually designed, they were just done without thinking about it based on what 2D canvas was doing (if thought was given I'm sorry I'm just trying to provoke some thought now).

In particular the way contexts work right now it was assumed every canvas should have it's own context. Did anyone question that? It would have been just as valid to create contexts separately from canvases and then bind that context to whatever canvas you currently want to render to. If important it could have been which ever context is first bound to that canvas is the only context that can ever be bound to that canvas but that the same context can be used with as many canvases as you want.

That design would have solved sharing resources and broken absolutely nothing.

Imagine the WebGL API was this

const gl = new WebGLRenderingContext();
gl.bindCanvas(someCanvas);  // we're now rendering to someCanvas
gl.draw(...)
gl.bindCanvas(someOtherCanvas);  // an implicit GL flush, and copy/flip to someCanvas happens here
gl.draw(...);  // renders to someOtherCanvas

Similarly people have wanted to be able to not commit the changes (no implicit swapbuffers/resolve). So commit could have been part of the original API

const gl = new WebGLRenderingContext();
gl.bindCanvas(someCanvas);  // we're now rendering to someCanvas
gl.draw(...)
g.commit();  // does swapbuffers/resolve for someCanvas
gl.bindCanvas(someOtherCanvas);
gl.draw(...);  // renders to someOtherCanvas
g.commit();  // does swapbuffers/resolve for someOtherCanvas

That would have solved all of the resource sharing issues. It would also have solved rendering over multiple frames without having to manually render to a framebuffer. Note that native apps expect this behavior, nothing shows up until they call swapbuffers. They don't have that option in the current WebGL API and instead have to write other solutions.

If that was how WebGL worked before workers would the suggested APIs for workers change? My point is that canvases are just really 2 textures. A displayBuffer and a drawingBuffer. They don't really need a context per canvas and they don't need to implicitly swap. We went that way mostly because we followed canvas 2d. If we were to imagine a past where we had chosen this other path would we be coming up with better/simpler solutions? I worry about an API where I'm told "browsers will figure out magic behind the scenes to make things performant" when it's possible a different API could remove the magic and just be performant by design.

dbaron commented 6 years ago

As someone coming at this with a bit less graphics background: what are the resources that you're talking about sharing?

greggman commented 6 years ago

Resources are basically textures and geometry data. As it is now if you load a 2048x2048 texture (16meg) into WebGL it can only be used directly on a single canvas. If you want to use the same texture in another canvas you have to load that texture again into the context for the 2nd canvas because textures (and geometry) can not be shared across contexts (one context per canvas, all resources belong to a single context.

The creative workaround is to make a canvas offscreen, render to that, then draw that canvas into the other canvases. That's slow. It can be optimized but no amount of optimization will erase the large copy that's happening to copy from one canvas to another.

Why one context per canvas? Because honestly I think it was just assumed to be the right thing to do (if other ways were discussed I'm unaware of them). Why can't you share resources across contexts? Because OpenGL has some very strict rules on how 2 contexts share resources and when changes to a resource made in one context are visible in another context. Those rules are extremely hard to enforce so as to make sure that sharing will work correctly everywhere across all platforms, something important for WebGL.

So, sharing was never enabled but that's all based on the idea that there should be one context per canvas. If instead contexts and canvases were disconnected and realizing that canvases are really just themselves textures the entire problem of sharing disappears. If you want to share make one context and use it with as many canvases as you want. If you don't want to share make a new context.

dbaron commented 6 years ago

So in OpenGL API these resources are scoped to a GL context? It seems like one could also address that by retaining the concept of a canvas context, but adding a GL context object (distinct from the canvas context) that could be shared between multiple canvas contexts by passing it to them when they're initialized?

kenrussell commented 6 years ago

@greggman if bindCanvas were the primitive of rendering to multiple canvases from a single WebGL context, there would still be synchronization issues if rendering from workers. Fundamentally, workers are not synced with the main thread. It'd be possible to invent new web primitives like the concept of swap groups (see GLX_SGIX_swap_group and GLX_SGIX_swap_barrier), but after much design it was decided to phrase these primitives in terms of existing primitives on the web platform (ImageBitmap, Transferables, postMessage). For rendering to a single canvas from a worker, when those updates are not synchronized with the main thread's DOM updates, OffscreenCanvas and commit() will have excellent performance.

The recycling of older textures would be roughly equally complicated if binding a single WebGL context to multiple canvases or OffscreenCanvases, as it would be if resizing a single OffscreenCanvas to multiple dimensions repeatedly and calling transferToImageBitmap / ImageBitmapRenderingContext.transferFromImageBitmap to display the frames. The bindCanvas model would have new gotchas, like what would happen if attempting to bindCanvas one canvas to multiple WebGL contexts, that would have to be thought through.

I think we should finish implementing the current APIs and measure the performance characteristics. In Chrome the implementation is finally almost done. If it turns out the ergonomics of the API aren't good or it doesn't perform well for real-world uses then we can look into recasting it.

RByers commented 6 years ago

Note that there's now a blink intent-to-ship thread for this feature

jhm-ciberman commented 6 years ago

Hello. Is there currently a way to have a single WebGLRenderingContext drawing for multiple HTMLCanvasElement (or OffscreenCanvas)?

greggman commented 6 years ago

I'm not sure where to bring this up but under the current blocking OffscreenCanvas.commit() proposal how are pages that are not the front tab handled?

With rAF the browser just doesn't call the callback. With blocking commit though what's the plan?

if commit blocks forever then the worker is stuck unable to process other events.
if commit is a no-op or doesn't block then the worker is wasting time even though the user is not viewing the page

I can imagine the following patterns with commit

commit in spin loop

 // in worker

 while(true) {
     render();
     offscreenCanvas.commit();
 }

commit in raf loop

 // in worker
 const socket = new WebSocket("ws://www.example.com/socketserver");
 socket.onmessage = handleMessagesFromServer;

 function loop() {
     render();
     requestAnimationFrame(loop);
     offscreenCanvas.commit();
 }
 requestAnimationFrame(loop);

In case 1 above blocking commit if the page is not the front page seems reasonable. In case 2 it does not because the worker was expecting to be able to process events. How will browsers be able to handle the 2 cases?

greggman commented 6 years ago

So I tried it and at least in Chrome commit seems to be broken

This example does this

const appInfo = {
  clientWidth: 300,
  clientHeight: 150,
};
function render() {
    resize if canvas size does not match client size
    render scene
    requestAnimationFrame(render);
    gl.commit();
}
onmessage = update appInfo clientWidth and clientHeight

The worker as no way of knowing that size the drawingbuffer needs to be so the main thread sends that info whenever it changes. But, once the worker starts no messages ever arrive from the main thread even though the main thread is sending them. Given that gl.commit is synchronous and many other things are going on it seemed best to call rAF before gl.commit so that the next animation frame comes as soon as possible.

In this sample I swapped the order to rAF after gl.commit but it also never receives messages from the main thread

Also note using gl.commit no events are delivered period. Here's an example that tries to load a texture using fetch. The fetch callback is never received

This seems far from completely specced. Having commit basically make the entire rest of the platform fail to work seems wrong but the spec does not make it clear what is supposed to happen. My guess is chrome promotes rAF events to the top of the event queue so regardless of what other events are pending the rAF event gets run first and then gl.commit blocks processing.

That could be a bug in Chrome but AFAICT it's not wrong based on the spec. I think the spec should be clear how these messages get processed when there's a raf+commit loop as well as just commit loop.

Here's a sample with a commit loop (no rAF) as in

  while(true) {
      ...render...
      gl.commit();
  }

It also tries to download an image with fetch and update the canvas size by having messages passed from the main thread. No messages ever arrive and the fetch callback is never called.

greggman commented 6 years ago

As the original rAF creator any input @rocallahan on this seemingly platform breaking API?

w3ctag / design-reviews

Review OffscreenCanvas, including ImageBitmapRenderingContext #141