immersive-web / webxr

Repository for the WebXR Device API Specification.
https://immersive-web.github.io/webxr/
Other
2.98k stars 382 forks source link

getFrameData() should return a new VRFrameData object instead of data-copying. #128

Closed bfgeek closed 6 years ago

bfgeek commented 7 years ago

We don’t have this pattern on the web today and I don’t think that we should introduce it.

I can’t see a good reason to have this API over one that returns a new object each time. Both versions of the API require the same amount of mem-copying from what I can see.

E.g.

const data = vrDisplay.getFrameData();
partial interface VRDisplay {
 VRFrameData getFrameData();
};

A null return value could represent the false value in the spec today.

bfgeek commented 7 years ago

cc/ @esprehn @ojan

RafaelCintron commented 7 years ago

The rationale for having getFrameData fill in a VRFrameData object is to promote object reuse and avoid creating garbage objects every frame that need to be GCed.

esprehn commented 7 years ago

Yeah I totally understand that concern. While some APIs might need special treatment to avoid creating excessive garbage, for example something very low level you call thousands of times per frame, I don't think VRFrameData requires this.

You only create a few/one per frame You're not expected to call it thousands of times, but maybe once at the start of every render loop. If you write your application well this is producing at most one allocation per frame.

Browsers must cope with small amounts of garbage To maintain high quality touch driven interactions, scroll effects, and smooth animations at 60fps (or higher for higher hz displays) browsers must already be able to deal with small amounts of garbage every frame. This is because all typical web content produces garbage.

It's unlikely your app will produce zero garbage Even expertly written content generates small amounts of garbage because many platform APIs, even lots of JS ones, produce garbage. Note also that nearly all new platform APIs require producing some new objects, for example CSS Typed OM, various DOM APIs, Promises on every .then() call, etc.

Further JS engines garbage collect the code being generated by the app. This means that you often see small garbage collections collecting the optimized, or de-optimized code, scope records, closures, or various other internal data structures in your program.

Browsers use garbage collectors internally too Chrome uses a C++ garbage collector internally, it's unlikely your JS code talking to the platform APIs is not producing any garbage at all inside the engine. It think Edge uses a GC too, though I don't how how much of the engine uses it. :) For Chrome we're going to need to do small GCs because you're touching various DOM/Canvas/Web Platform APIs every frame. Adding one more object to that should not materially impact your performance.

For example calling requestAnimationFrame in Chrome produces garbage for the callback, even if you pass the same function every frame.

ssylvan commented 7 years ago

One thing to bear in mind is that some VR devices refresh at 90 or 120 frames per second, thus the margins are potentially much smaller than for a typical web app that updates at 60Hz. The consequences of missing a frame are also potentially more severe (reprojection can help, but not without introducing artefacts that people react differently too, in terms of comfort).

Also, for mobile hardware (e.g. GearVR, Cardboard, Day Dream, Hololens, and future stuff) the issue is obviously significantly harder to deal with due to higher memory latencies, slower CPUs, etc.. I can tell you that we hunted down even single allocations (Unity/.Net native) in the steady state game loop on all the Hololens apps I was involved with! :)

bfgeek commented 7 years ago

Can you provide a code example of something that doesn't allocate any garbage? Are these examples where you need to remove every allocation only built on asm.js?

We are trying to reconcile the fact that a lot of the current WebVR experiments use three.js or similar (which produce garbage each frame) vs. the position here that WebVR must never have any allocations per frame.

ssylvan commented 7 years ago

It's probably not very likely that all (or even most) WebVR apps will run without any steady state allocations, but there's an argument to be made that the API should avoid making the situation worse, to give the app more head room.

AlbertoElias commented 7 years ago

I think that if all code examples and major libraries do it properly, we'll be advocating for a performant way of handling VRFrameData at the same time that we have an API that's consistent with the rest of the platform

RafaelCintron commented 7 years ago

I agree with Sebastian on reducing memory allocations as much as possible.

Recently, the WebGL working group was asked to change multiple functions the WebGL 2.0 API (bufferData, bufferSubData, texImage2D, etc) to make it easier for callers of the API to avoid creating garbage in the form of ArrayBufferViews. This feedback came from the Emscripten, people who care about performance. So when it comes to other graphics APIs in the web platform, there is an increasing bias towards reducing or eliminating memory allocation, both internally to the browser as well in the caller.

Obviously, we're not going to eliminate every memory allocation under the sun but the more we can improve, the better.

bfgeek commented 7 years ago

cc/ @littledan @ajklein

So there are a few things here: 1 - Apps are most likely going to create garbage / have allocations each frame. 2 - The garbage / allocations each frame shouldn't be so egregious that the garbage collector have trouble keeping up. 3 - We should try and keep this api consistent with the rest of the platform.

We are arguing that this is a very small allocation and this isn't worth diverging from the rest of the platform. If you have data that shows an allocation this size is bad for apps that would be interesting to see.

I think your argument is that this is worth diverging from the rest of the platform?

That said what you are arguing for here is a world where (for example) any new input event (e.g. pointermove) aren't event objects but done by object populating like in this example?

RafaelCintron commented 7 years ago

I am saying there have already been cases where creating small objects in graphics API's like WebGL have forced us to add new API entrypoints to avoid these allocations. As we move towards WebVR applications with increasing computational and GC needs of their own, I would hate for us to be in the same boat as WebGL.

esprehn commented 7 years ago

It's not clear to me that the WebGL2 API avoiding the ArrayBufferView allocations makes sense either. I'm very concerned here that there's two very different platforms being built.

The modern API designs like Promises generate small amounts of garbage, for example the return value of the Promise, and the result of then(). The argument here seems to be that the platform should avoid creating any garbage at all, which is arguing we should be adding versions of event listeners that don't create garbage, stop using Promises and switch back to callbacks, and many changes in the direction the whole platform has been moving in over the past five+ years. ex. having a form of Object.keys() which takes an existing array and fills it with the string keys instead of returning a new array.

I think we need to either make a principled decision that the entire platform needs to provide pre-allocated methods and then change Object.keys, stop using Promises, and so on, or we need to assume that the rest of the platform makes sense and make new APIs like WebGL2 and WebVR match the same ergonomics as all the other DOM APIs.

ssylvan commented 7 years ago

Games, graphics and in particular VR have much stricter performance requirements than other scenarios. For VR in particular, framerate issues can literally make people sick. If WebGL and WebVR make minor compromises on ergonomics to improve performance while less performance sensitive APIs do the opposite, I don't really see a problem with that. In fact, I think such an outcome would be desirable.

Having APIs that are tuned for their intended use cases isn't a bad thing.

littledan commented 7 years ago

I second @esprehn's points. Allocating small, short-lived objects is generally supposed to be a "good path" for JS engines due to generational GC (though that might not always work perfectly for platform-allocated objects in all browsers). If you have any benchmarks which suffer when allocating small, short-lived objects like this, it'd be great to send them in the direction of the V8 GC team, cc @mlippautz

bfgeek commented 7 years ago

https://github.com/w3ctag/spec-reviews/issues/106#issuecomment-257493565

ssylvan commented 7 years ago

I think we might be slightly talking past each other here, so just to clarify: I'm not suggesting that this one allocation will trigger some kind of pathological corner case or anything, but rather expressing concern that even the "good path" for typical web apps is actually problematic for VR due to higher frame rates (90 or even 120Hz) and higher stakes for missed frames (physical discomfort).

Some back-of-the-napkin math suggests that this object, and its associated VRPose, would weigh in at about 1KB or more (depending on 32-bit vs 64-bit etc.), which means that if you only have a few megs of young generation space, this one allocation alone would cause a GC (and thus a missed frame) several times a minute. I'd hate for an app developer to do everything right and eliminate all their main loop allocations only to discover that the "ceiling" for how long they can go without a missed frame is in the order of seconds, due to this one API.

Given the special performance circumstances for VR, and since WebGL already had to go back and make changes for this same reason, I think we should be a bit more specific about what we're actually "signing up for" by making this proposed change. @littledan do you have any back-of-the-envelope numbers for what "good path" performance means for your GC for desktop, mobile, etc.? All I can find online with cursory searching is that v8 targets 5ms, which would mean potentially 1 or more missed frames even at 60Hz, and much worse for 90 or 120Hz. Is that accurate? What kind of gen 0 heap sizes do you typically use?

littledan commented 7 years ago

We're working towards a 60fps goal, but really we're trying to get things "as good as possible" balancing several metrics (responsiveness vs memory usage being a big one). The parameters for GC are different on different platforms, depending on the memory size, etc.

Is it really possible to write any sort of WebVR thing which doesn't invoke GC at all? I find it hard to believe that you wouldn't be "causing GC several times a minute". Anyway, recent changes in V8 are exploring concepts like preemptively launching parts of full (not just young generation) GC even in the absence of tons of pressure, based on idle time or just fixed timers.

mlippautz commented 7 years ago

Can only repeat what @littledan already said. If you have a repro of something janky, even if it's just a prototype for a spec, let us know.

I cannot comment on the spec changes other than we really suggest not using any sort of object pooling, as the runtime (compilers) will have a really hard time optimizing for it. E.g. escape analysis, write barrier elimination, allocation folding, and other optimizations will not be applied.

Ever changing GC characteristics: Young generation in V8 currently grows up to 8M when the allocation rate is high. Workloads that produce lots of dead memory are cheap from a GC perspective as we only need to touch live memory (sub-ms young generation collection). We also employ idle time GC, which especially helped WebGL workloads where in some cases all GC is performed during idle times.

ssylvan commented 7 years ago

@littledan I don't think it's likely that apps will have no GC, but I don't think that's an argument for making the API generate garbage (see WebGL). The less garbage we generate, the more the app can generate. It seems like it should be possible to write an app that has no or very small/few steady state allocations, especially if the JS is generated from some tool like emscripten or a game engine. If something allocates when you click a button or something, it's not so bad because it's naturally rate limited. A couple of relatively large allocations at 90 or 120Hz adds up quick, however.

The fact that WebGL already had to go back and change their APIs to reduce generated garbage is a pretty strong data point IMO. We should learn from that exercise and not introduce garbage in our core per-frame APIs without some solid reasons for why we think the lessons learned with WebGL don't apply. Especially since VR has significantly higher performance constraints than typical WebGL apps.

ssylvan commented 7 years ago

FWIW, I just loaded up the first scene I could find on Sketchfab in the latest Chrome WebVR build and found on my machine:

  1. This barely runs at 90Hz. Just on the edge, fluctuating back and forth between ~11-14ms.
  2. When there's a minor GC it usually misses the 60Hz deadline too.
  3. Minor GCs take upwards of 2ms (not sure if I can get a global average easily, I just found a few in the timeline)
  4. The GC kicks in about twice a second.

This is using the existing API, which does not generate garbage directly in the getFrameData call, so the above is what you get from whatever ambient allocations chrome does internally, as well as what the app does.

There's a few conclusion you could draw:

  1. It's hard to hit VR framerates on the web.
  2. GC has a major role to play in determining whether or not you're going to miss a frame.
  3. At least this one app generates quite a lot of garbage on their own, regardless of what we do.

So you could either say "we're hosed either way, so no harm adding even more garbage", or you can say "wow, it's already pretty hard to do a good job performance-wise here even on a relatively beefy desktop PC, so we should definitely avoid adding to the problem". So not sure if that this helps us come to a conclusion either way.

Sketchfab may be a particularly unoptimized scenario and thus not representative, or it may be typical, hard to say. As I've mentioned before I'm inclined to get out of the way as much as possible so that if a developer does want to heavily optimize their app to get rid of GC pauses, we won't stand in their way.

mlippautz commented 7 years ago

Can you provide a trace file?

I ran the regular WebGL version of the scene on my Macbook on a recent Canary yesterday. The vast majority of GCs were triggered in idle time. These GCs are triggered by the scheduler and have 0 impact on latency. The more idle time we have, the more GCs we will usually do.

I also just downloaded the Windows Chrome WebVR build and ran a trace on the provided link. I do not know anything about the build but would assume that it is configured to try to run with 120Hz, even on a regular machine. I barely get to 60fps, but GC is not an issue there. I can also see the compositor trying to output a frame every 16ms, so I wonder what's actually going on there.

Repeating myself: Applications producing short-living objects find themselves in the sweet spot of the GC; this will be a fast path. I recommend to not trade this sweet spot for a different approach that prevents compiler optimizations.

toji commented 7 years ago

@mlippautz: I'm assuming you weren't using VR hardware with the WebVR build? In the case that you're not actively presenting to a VR device the content will still run at the screen refresh rate (60Hz in most cases). In reality the performance in that mode should not be significantly different than top of tree Chrome.

ssylvan commented 7 years ago

I was using an Oculus yes, and the GCs were not during idle time, but rather in the middle of the rendering code. This presumably varies a bit by machine too.

EDIT: Oh, and I should say that it almost looks like it runs at 60Hz (the little vertical bars indicating a frame boundary were at 60Hz), so the timings above where I talk about frame rate is just me inferring from the frame times (I assumed maybe there was a misconfiguration or something, since Oculus wants 90Hz). I'll see if I can rerun the test later today (or more likely tomorrow) and save the trace this time.

EDIT2: Presumably if you're actually on a VR device all the rendering gets 2x as expensive? So that would change things.

DigiTec commented 7 years ago

We were doing some testing to see how much of your RAF frame budget you could use and still hit frame-rate. Could you use 16ms? Probably not, since you may not have a full 16ms depending on the delay of the RAF by the event loop, but maybe 12 would work, etc...

As part of this we were putting dead loops in that just timed out a sample update/render. We used performance.now(). This caused a frame hiccup every couple hundred frames due to GC with NOTHING ELSE happening because of the doubles allocated by performance.now().

Now, these doubles didn't leak (they were used within a single function, in a local variable loop, and could have potentially been optimized away entirely. But they weren't. And the fact that a couple of megs of doubles coming from a timing function which we expect people to use fairly heavily for their timings was causing a frame skip every 200 frames is pretty startling.

This means the fill in the buffer model we are using is certainly helping out in some circumstances.

We are trying to find an avenue to share some of this code with the general group. It does bring up the potential that it is impossible with some architectures to achieve 60fps with no frame skips, ever, given the way they work. I don't know if the best possible fix here is to ensure that we get GC updates for this issue or that we keep our APIs as friendly as possible to current GC architectures.

mlippautz commented 7 years ago

@ssylvan Thanks, curious about the trace file.

@DigiTec: If you can provide a repro, please file a bug at crbug.com

curtisman commented 7 years ago

I work on Chakra's GC here. My two cents is that it is always good to avoid creating garbage.

We try hard to build heuristics so that we can avoid GC when script is running, and try to find idle time to do GC, although it doesn't always work because of the amount of object the script is creating. Even within the engine itself, we try to not allocate short lived objects. Generally, the more garbage is generate in the script before idle time, the more likely that we will have to GC and interrupt at an inopportune time.

While GC should be able to deal with some amount of garbage, you already have a very tight budget here in this scenario between frames to do what program script needs to do. Why add more burden to the system with extra allocation and pressure on the GC?

ssylvan commented 7 years ago

Here's a trace from my machine, looks like the GC cost dropped a bit (typically 1.3ms now): https://1drv.ms/u/s!AjL69IOnjhAA4Z03cS_uQZ4AbNb0Eg

A lot of the GCs seem to happen at the start of a frame, before the RAFs but after the frame boundary (e.g. the first GC), and some smack in the middle of the frame (e.g. the second one).

RafaelCintron commented 7 years ago

If it weren't for the Emscriptem folks, I would be inclined to let getFrameData allocate. However, the fact they ran into trouble and convinced the WebGL working group to make this change gives me pause. The problem is no longer theoretical in nature as far as I am concerned.

As user agents, we should create as much headroom as possible for Javascript developers to be excellent at more than just demos or simple pages. I have tremendous respect for the people that build Javascript engines. However, when it comes to allocations for high usage APIs like getFrameData, I think we're better off adopting a "the fastest code is code that doesn't run" approach.

mlippautz commented 7 years ago

Alright, thanks for taking the trace. Did you include the V8 category when recording the trace?

If yes, then all I see is MinorGCs (=Scavenges) directly executed from tasks. We currently only schedule idle tasks for Scavenges. This means that the scheduler thought it might be a good idea to do a GC here.

If you didn't enable the V8 category specifically, could you take another trace?

ssylvan commented 7 years ago

I didn't enable anything in particular - I'm sorry I'm not super familiar with Chrome dev tools. I'll try to find time to get another trace at some point. E.g. the 2nd GC looks like it interrupted rendering work.

toji commented 7 years ago

@DigiTec: If I recall correctly we left this conversation at the F2F with the next steps of Oculus providing some traces or traceable examples to the Chrome team to better evaluate the impact of returning a new object each frame? Where are we with that?

bzbarsky commented 7 years ago

I'm quite confused by the GC arguments here. The one implementation of getFrameData that I've looked at (Mozilla's) doesn't allocate a new VRFrameData object, but does allocate new objects for all the properties if you touch them, so you're only avoiding generating garbage if you don't examine your returned VRFrameData at all.

Now maybe that's not how this is meant to be implemented? Per https://github.com/w3c/webvr/issues/197 the spec doesn't define how getFrameData works...

kearwood commented 7 years ago

I would like to clarify if this discussion relates only to WebVR 2.0 or if anyone intends on amending WebVR 1.1.

Could we ship with the existing WebIDL in 1.1 and apply any changes resulting from this discussion to WebVR 2.0?

RafaelCintron commented 7 years ago

I agree with @kearwood that we should address this issue one way or another as part of WebVR 2.0.

FWIW, Edge allocates all subobjects (matrices and VRPose) of VRFrameData when you construct it. Zero memory allocations happen inside of getFrameData.

RafaelCintron commented 7 years ago

I'll also add that I am OK with changing getFrameData to return a new object if others feel strongly about it. Since we're likely going to have getFrameData take parameters in the future, the returned object is going to have a more temporary status moving forward.

toji commented 7 years ago

After a lot of swaying between positions I'm with Rafael here. Sticking closer to the web platform norms feels better at this point. If this is something that presents an unreasonable burden for anyone, I would strongly encourage you to reply with traces/logs/graphs to demonstrate the specific issue, at which point it can be addressed more concretely.

littledan commented 7 years ago

@bzbarsky I don't think V8 has the same kind of lazy allocation infrastructure that SpiderMonkey does.

bzbarsky commented 7 years ago

SpiderMonkey doesn't have any interesting lazy allocation infrastructure I'm aware of, so I'm not sure what you mean....

toji commented 6 years ago

This was resolved a long time ago in WebXR (We're returning an new XRDevicePose when the equivalent function is queried.) We still need to be sensitive to garbage creation, but I don't think keeping this issue open helps that cause. Closing now.