"Ensure an immersive device" cannot be run in parallel

Manishearth commented 4 years ago

Run the following steps in parallel:

Ensure an immersive XR device is selected.

This step is running in a parallel thread but then you invoke something that grabs objects from the main thread. I think you need to make that more clear and separate what happens on the main thread and what happens in parallel.

Also, the selection that happens here presumably updates the state on the main thread. That should happen in the same task that eventually resolves the promise. This also goes for the step below here where you resolve with false.

Does that makes sense?

(I suspect something similar applies to the other in parallel algorithm, but I did not look in detail.)

Originally posted by @annevk in https://github.com/immersive-web/webxr/pull/1032

Manishearth commented 4 years ago

We might need to write ensure-an-immersive-device-is-selected as an algorithm that takes in "next steps"

Manishearth commented 4 years ago

Actually I think it might just be cleaner to stop calling that algorithm from "in parallel". While implementors may have asynchronous bits in it, that can be papered over.

In practice, the values managed by this algorithm will likely be on a different thread anyway.

Manishearth commented 4 years ago

Hmmm, this is trickier than I think.

The core issue is that the ensure algorithm needs to be asynchronous, since querying device state can be slow. This is why isSessionSupported() returns a promise. The state it manages

On the other hand, we don't use it that way in getContext(), That is a synchronous algorithm.

This has the weird result of us being in a situation where either "isSessionSupported() doesn't need to be a promise", or getContext() involve a synchronous block on what is essentially isSessionSupported(). Our spec is basically disagreeing with itself on how hard ensuring device selection is.

In this light I don't think the original concern in this issue is actually the major one: The "immersive XR device" etc objects likely live on another thread anyway. The problem is we're accessing them synchronously in one case.

From skimming the code it seems like Chromium doesn't handle this flag at all? I might not be reading things correctly. Servo doesn't either. cc @toji @asajeffrey

Manishearth commented 4 years ago

Talked a bit with @toji about this. It seems like getContext() is done asynchronously "under the hood" in Chromium anyway, with the WebGLRenderingContext being created with a bare IPC handle, and the context creation stuff being done asynchronously across IPC, with further commands also getting buffered up across IPC. A path forward if this is everyone's behavior is to use some weaselly language to crate and return a context and queue a task to actually "populate" it and have some explanatory notes, along with followup HTML spec issues filed.

@jdashg said that this is not how Firefox works, and that context creation cannot be "asynchronous under the hood" because it is fallible and it needs to be able to throw. (It's perhaps possible to paper over that by emitting context loss events when this happens)

It might be worth letting this be synchronous but just getting UAs to print a warning asking people not to use getContext() with xrCompatible and instead suggesting makeXRCompatible().

Manishearth commented 4 years ago

Note: Currently both Chrome and Firefox ignore this flag.

Asking around, there seems to be a strong dislike for synchronous long-running APIs. I tend to agree, these are bad. Patching an existing API to add synchronous behavior seems worse.

Removing it is an option, but this will likely break existing content. Well, it won't quite be a breaking change since no browser that I know of handles this flag, but it could break content written with this API in mind in the future as browsers begin to handle the flag. Content would break because the XR compatible flag is not set. We could stop erroring because of that flag, but this means content is once again free to attach incompatible contexts to sessions.

A potential path forward is to make .getContext({xrCompatible: true}) behave like .getContext() + makeXRCompatible() in the worst case: when there is no device, and post selection the device needs a context change. This basically means that there will be context loss events in some cases. The Chromium trick of deferring actual context creation would continue to work. With this route content may also break because the XR compatible flag would not be set, though.

Manishearth commented 4 years ago

I see four rough options:

Remove the xrCompatible flag from getContext(), notify users that they should use makeXRCompatible() instead, which is asynchronous. This will break existing content unless we stop erroring when the XR compatible flag is not set.
Same as (1), but we also stop erroring when the XR compatible flag is not set. This will not break existing content because browsers do not do anything meaningful with this flag yet, but existing content may continue to obliviously use this flag and eventually break when browsers start handling this. Furthermore, the "context isn't xr compatible" error is _useful_On the other hand, it gets rid of the partial dictionary and the weird getContext()-patching we're doing which isn't very kosher anyway.
Allow xrCompatible to introduce slow sync behavior in getContext(), perhaps with a deprecation warning. This is not great; slow sync APIs are bad
Have .getContext({xrCompatible: true}) behave like .getContext() + makeXRCompatible(), where we create a context for the device if we know what it is, otherwise create a normal context, and then trigger context loss/restore if it turns out that we were supposed to create a different context. This is slightly weird and applications may be forced to handle the context loss event in select cases on select devices. In a perfect world they would be doing this anyway, however I suspect a lot of content isn't. This puts us in a similar bucket as 2., where this is not a breaking change, but when browsers implement this we will start having problems, and that too only with specific devices.

/agenda Our current spec makes the getContext() API block, which is really bad and we should discuss ways to avoid that. None of the choices are particularly good

RafaelCintron commented 4 years ago

Since WMR and OpenXR mandate you use the discrete adapter on hybrid devices, XR experiences are completely broken on Chromium today, flag or no flag. Non-Chromium Edge implemented adapter switching for WebVR and we found a substantial number of hybrid devices were enabled via the changes.

@Manishearth though Chrome does not currently implement the xrCompatible flag, there is a Chromium CL in the works to address this. Once it, and the followon CLs land, GPU rendering will move to the XR-preferred adapter when you use the xrCompatible flag in getContext. Any WebGL contexts on the previous adapter will receive contextLost. The behavior has aspects of all of your proposed options. Using xrCompatible will have "slow sync behavior". However, if the sync behavior is not able to determine whether an XR-compatible adapter is mandated, it will create WebGL content on the default adapter.

Of the options you listed, I prefer (4). While not all content implements contextLost, I expect more code uses it than use makeXRCompatible. Synchronously handling xrCompatible is the best way to make existing WebXR content work without incurring contextLost.

Manishearth commented 4 years ago

Oh, good to know, thanks!

Going through the CL i'm not super happy with the slow-sync behavior, but it does have some aspects of (4), and you mentioned you're in favor of (4), which is good to hear.

asajeffrey commented 4 years ago

If we're making changes to WebGL context initialization, I will suggest a more drastic change, which is to allow session creation to also create an XR-compatible WebGL context, so by default the WebGL context is not shared with an HTML canvas element.

Manishearth commented 4 years ago

would that just obviate the need for makeXRCompatible? What if you need additional contexts, e.g. for layers?

asajeffrey commented 4 years ago

It means sessions could get an xrcompatible GL context without having to reuse one from a canvas. The down side is that content wouldn't be able to pre-populate the context with webgl resources. (Which is wasted effort if webgl contexts have to be torn down and recreated when they are made XR-compatible, but that's a minority of cases.)

toji commented 4 years ago

@kenrussell: Is this something that we could discuss with the WebGL working group on an upcoming call? This is definitely an area of API overlap that I think the WebGL WG should have to opportunity to weigh in on.

kenrussell commented 4 years ago

@toji yes, certainly. Have just sent out a call for agenda items; please add this to the document. Thanks.

toji commented 4 years ago

In a turn of events that I didn't expect, when discussing this with the WebGL group today there was broad consensus that they wanted to look into creating an asynchronous version of getContext (a getContextAsync()) that would more comfortably accommodate long-running requirements like xrCompatible. This is partially driven by pre-existing context creation args like powerPreference that already have to do a similar GPU adapter switch.

If that approach were to land, it would be ideal if xrCompatible were restricted to only work with the asynchronous version of the function, though that would obviously be something we'd make a gradual transition to.

(Update: Filed a corresponding issue on the WebGL repo)

Manishearth commented 4 years ago

Oh that's awesome!

In that case, I see two reasonable paths forward:

remove the flag and ask content to slowly transition to makeXRCompatible() (by making it a warning and later an error to have the xr compatible flag unset). Note that no devices which require this flag currently work, so we're not really breaking content yet, but we have a small window in which to do this.
spec the flag as working, but triggering context loss if an adapter switch happens

thoughts?

immersive-web / webxr

"Ensure an immersive device" cannot be run in parallel #1036