immersive-web / webxr

Repository for the WebXR Device API Specification.
https://immersive-web.github.io/webxr/
Other
2.98k stars 381 forks source link

Provide statistics to help guide performance #1203

Open cabanier opened 3 years ago

cabanier commented 3 years ago

One of the hardest parts of WebXR is how to tune for good performance. It is hard for authors to gauge how much processing power is available to them so they end up optimizing their code for the device they have on hand. We should give them the tools they need so they (or framework authors) can dynamically change parameters of their experience. We've added support for framebuffer scaling, ffr and variable framerate but unless an author increases the load unit the device can no longer keep up producing frames, they can't figure out how much extra processing the system has.

I looked at the Compute pressure API but I'm unsure if that will help since it doesn't seem applicable to VR Headsets.

As a browser developer, what can we provide authors to help them out in this area? Total time spend in JS and rendering? Total idle time between Raf calls?

Maksims commented 3 years ago

I believe current browser Performance audit provides decent information on how much time is spent on JS and on rendering, also gives an idea on idle between frames. With nature of various platforms, they might have other processes out there, which for developer is not known. Means "available" idle, might be actually used by other processes.

VRAM have been always a very major performance drivers. But this perhaps is more WebGL related.

One thing definitely would be great to know, is how much time is spent by underlying WebXR systems to profile VR/AR, especially when using additional features. For example: optical hand tracking, plane detection, light estimation, etc. It would be useful to know of how much is spent by WebXR systems, so developer can take it into an account when using available hardware budget.

cabanier commented 3 years ago

By "browser Performance audit" do you mean profiling your site with the browser's developer tools? If so, that comes back to my point that authors will just optimize for the device they have on hand.

VRAM have been always a very major performance drivers. But this perhaps is more WebGL related.

Mobile GPU's work differently than desktop ones which makes the story even more difficult. I'm unsure what we can do about that apart from telling authors they have to test on both. :-\

One thing definitely would be great to know, is how much time is spent by underlying WebXR systems to profile VR/AR, especially when using additional features.

At least on Quest, those run on different dedicated cores so they shouldn't interfere.

Idle time on the between Raf calls is something that we can report and our OS also reports load on the CPU and GPU. If people are interested, I could expose these as an experiment to see if they can be used to tune performance.

I'm also wondering what authors do with WebGL experiences because the problems seem similar.

Maksims commented 3 years ago

I'm also wondering what authors do with WebGL experiences because the problems seem similar.

In PlayCanvas, we expose various profiling tools: ministats, launcher profiler, some internal logging, to expose:

  1. Render, update, shadows, times
  2. VRAM (internal counters) usage, and split by types of resources: textures, buffers, framebuffers
  3. Shader compilation times (as they are sync in WebGL by default).
  4. Draw calls - this is important, as knowing your budget for target platforms helps to develop for it, even when developing not on the target platform.
  5. Overdraw - we don't report that but have done custom profilers for specific projects, to identify overdraw for fragment shaders, this is very useful when optimizing with heavy fragment shaders.
  6. More engine-related things: number of shaders, materials, shadows rendering, UI rendering, culling time, geometry batcher, etc.

Things I personally believe is key for successful (in general) WebGL content for a non-specifically targeted audience:

  1. Loading times - faster is better, download only what is needed for specific application state, clever bundling of assets, atlases for small textures (UI, sprites, etc.).
  2. UX - being simple and accessible. Often app looks great, and can even load fast, but if UX is bad - people simply go away. Fewer clicks/touch/interactions, and more value is better. Hiding gameplay behind menus - is a bad practice on the web.
  3. Garbage collection - this is often missed, but in the web due to automatic GC, allocating resources every frame, will lead to GC stalls, which can get out of hand quickly, so coders have to re-use resources, use pools for objects, and do not allocate/destroy anything as possible.
  4. Performance - this goes to major few things: draw calls, VRAM usage, GC, shader complexity (can either not fit in some limited platforms like iOS, or just be slow due to complexity).
cabanier commented 3 years ago

Thank you! How do you know that the rendering takes too long? Do you look by knowing the frame rate and seeing that the Raf starts slowing down? I'm specifically looking for a way to optimize render performance for experiences that already apply the best practices you listed.

klausw commented 3 years ago

cabanier@ wrote:

We've added support for framebuffer scaling, ffr and variable framerate but unless an author increases the load unit the device can no longer keep up producing frames, they can't figure out how much extra processing the system has.

For background, the dynamic viewport scaling feature lets UAs provide a recommendedViewportScale value as a hint to applications. In the Chromium implementation, that's based on an internal estimate of GPU utilization:

  // If nonzero, an estimate of how much of the available render time budget
  // was used for GPU rendering for the most recent measured frame. A value
  // above 1.0 means that the application is dropping frames due to GPU load,
  // and a value well below 1.0 means that GPU utilization is low. This is
  // intended to be used as input for renderer-side adaptive viewport sizing.
  // A value of zero means the ratio is unknown and must not be used.

Would it be useful to provide such a value directly to applications? It would need to be appropriately quantized to ensure that it can't be abused to extract fine-grained timing information, and some care is needed to make the value useful across platforms if it's based on inexact heuristics. Maybe it would be more appropriate to use enums (UTILIZATION VERY_LOW/LOW/NOMINAL/HIGH/VERY_HIGH) instead of numeric values?

Applications can to some extent detect and deal with CPU bottlenecks by observing rAF timing, but that doesn't help if overall performance is limited by GPU performance, and as far as I know there's currently no good way to get this information apart from developer tools applied to a specific device.

cabanier commented 3 years ago

... Would it be useful to provide such a value directly to applications?

Yes, I'm leaning toward something like that. However, it shouldn't be just GPU or CPU load because that doesn't tell you enough.

For instance, currently the Oculus browser will execute the page's javascript and then it will wait until the frame is done rendering. In theory this could show up as a 50% cpu and 50% gpu load tricking the user into believing that they can increase frame rate or rendering complexity. Adding these numbers up also doesn't help because it won't deal with browsers that render asynchronously.

Maybe a percentage value that reflects how busy the browser was would be more appropiate (and easy to understand). If the browser's JS and rendering took up 12ms of the 16ms frame time, the percentage would be 75. For async browsers, it would take up waiting for a free frame and the JS. A value greater than 100 would indicate that frames are dropping and the browser is overloaded.

It would need to be appropriately quantized to ensure that it can't be abused to extract fine-grained timing information, and some care is needed to make the value useful across platforms if it's based on inexact heuristics. Maybe it would be more appropriate to use enums (UTILIZATION VERY_LOW/LOW/NOMINAL/HIGH/VERY_HIGH) instead of numeric values?

We can run some experiments to see how finely grained this value should be.

Applications can to some extent detect and deal with CPU bottlenecks by observing rAF timing, but that doesn't help if overall performance is limited by GPU performance,

Interesting. Would an overloaded GPU not slow down rAF timing?

as far as I know there's currently no good way to get this information apart from developer tools applied to a specific device.

Indeed. I've observed that developers are hand tuning their applications for a particular device and then the experience is sub-par on other devices.

klausw commented 3 years ago

However, it shouldn't be just GPU or CPU load because that doesn't tell you enough.

For instance, currently the Oculus browser will execute the page's javascript and then it will wait until the frame is done rendering. In theory this could show up as a 50% cpu and 50% gpu load tricking the user into believing that they can increase frame rate or rendering complexity.

Adding these numbers up also doesn't help because it won't deal with browsers that render asynchronously.

Hm - I think this could fit into a common framework as long as the browser tells the app which steps happen in series and which are in parallel. Basically, for the Oculus browser, it would say 50% cpu and 50% gpu load, but both of them go into the same time bucket where the total must remain <100% to hit target framerate. For Chrome AR, it would say that CPU and GPU load are separate time buckets, where an app could let both grow separately to just under 100% while still hitting framerate, though that's at the cost of an extra frame of latency. (Latency is reduced proportionally when not using up the full budget.)

If these are the only two cases, the performance metric API could simply expose an enum that says if the CPU time and rendering stages are asynchronous (parallelized) or not.

More generally, it could report a stage (bucket) number for each metric, but that may be overkill:

# Oculus
frame.perfMetrics: [
   {type: CPU_LOAD, value: 0.5, pipelineStage: 0}, 
   {type: RENDER_LOAD, value: 0.5, pipelineStage: 0}, 
]

# Chrome AR
frame.perfMetrics: [
   {type: CPU_LOAD, value: 0.5, pipelineStage: 0}, 
   {type: RENDER_LOAD, value: 0.7, pipelineStage: 1}, 
]

(As an aside, I'd lean towards using CPU/Render naming. The CPU/GPU pair tends to look too similar to my aging eyes, especially in fonts where C and G may just be differing by a few pixels.)

Maybe a percentage value that reflects how busy the browser was would be more appropiate (and easy to understand).

I think this would lose some important information, i.e. would it be more helpful to reduce the object count or draw calls, or to reduce the pixel count or shader complexity?

Interesting. Would an overloaded GPU not slow down rAF timing?

Sorry, I was being sloppy with phrasing here. Yes, if an application is bottlenecked by excessive GPU load, the system will drop frames, and the app can detect that based on the interval between successive rAF calls. However, if the GPU is underutilized, there's no clear way to detect this other than trying to selectively increase GPU workload until the system starts dropping frames.

Conversely, if the bottleneck is CPU load, the app shouldn't try to use tuning methods that try to reduce GPU work, for example reducing the pixel count by decreasing the viewport scale wouldn't have any benefit.

cabanier commented 3 years ago

However, it shouldn't be just GPU or CPU load because that doesn't tell you enough. ...

... If these are the only two cases, the performance metric API could simply expose an enum that says if the CPU time and rendering stages are asynchronous (parallelized) or not.

More generally, it could report a stage (bucket) number for each metric, but that may be overkill:

# Oculus
frame.perfMetrics: [
   {type: CPU_LOAD, value: 0.5, pipelineStage: 0}, 
   {type: RENDER_LOAD, value: 0.5, pipelineStage: 0}, 
]

# Chrome AR
frame.perfMetrics: [
   {type: CPU_LOAD, value: 0.5, pipelineStage: 0}, 
   {type: RENDER_LOAD, value: 0.7, pipelineStage: 1}, 
]

Maybe we can simplify this even further. On Oculus, the two stages that take time are: running the javascript and waiting for the frame to finish. On Chrome AR, the two stages that take time are: waiting for a free frame and running the javascript.

"Running the javascript" we can call draw time. "waiting for the frame to finish" or "waiting for a free frame" we can call composite time.

For Google AR, composite time might be 0.

Maybe a percentage value that reflects how busy the browser was would be more appropriate (and easy to understand).

I think this would lose some important information, i.e. would it be more helpful to reduce the object count or draw calls, or to reduce the pixel count or shader complexity?

By going with 2 numbers, we'll gain this back.

Conversely, if the bottleneck is CPU load, the app shouldn't try to use tuning methods that try to reduce GPU work, for example reducing the pixel count by decreasing the viewport scale wouldn't have any benefit.

True, although an author could calculate this by timing the rAF call.

cabanier commented 2 years ago

/tpac discuss how to provide feedback to guide performance

DRx3D commented 2 years ago

Would this be implemented as calls that allow a developer to tune the system or an event (or more) to report when certain thresholds are exceeded? The first case is better to building the system but may cause some overhead if used live. The second case (events) can be great as long as the thresholds are correctly set.

AdaRoseCannon commented 1 year ago

Did we ever come up with a mechanism for detecting late frames or that the site is consistently missing frames?

cabanier commented 1 year ago

Did we ever come up with a mechanism for detecting late frames or that the site is consistently missing frames?

We did not :-\