webgpu / webgpufundamentals

https://webgpufundamentals.org
BSD 3-Clause "New" or "Revised" License
624 stars 88 forks source link

Possible issue or misunderstanding with timing performance #110

Closed FINDarkside closed 4 months ago

FINDarkside commented 4 months ago

Could be that I'm just misunderstanding something, but it seems to me like the helper timing class made here reports way too small numbers.

When I run this example with max number of objects, ir reports 0.8ms CPU and 0.3ms GPU which sounds like it should easily be able to hit stable 144fps. However it runs at somewhat stable 120fps. If I use Chrome performance profiling it claims over 7ms of GPU time even though the stats only claim ~0.3ms.

Have I misunderstood what it's supposed to measure or is there some kind of issue here? Clearly the numbers reported by Chrome profiling match with my FPS while the in-app ones do not.

Here's example screenshot to make it clear what I'm talking about:

greggman commented 4 months ago

The sample is limited by requestAnimationFrame which is synced to your monitor's refresh rate

If stop using rAF you'll get a higher framerate (though in reality you'd just be wasting the user's energy/battery since the display itself can't display that higher framerate.

https://jsgist.org/?src=3a36795c1f200cc07757c80bed33d1b0

FINDarkside commented 4 months ago

The sample is limited by requestAnimationFrame which is synced to your monitor's refresh rate

I get that. But it was not able to keep up with my monitors refresh rate when I increased the amount of objects even though the stats panel claims it takes only ~0.3ms. With lower object count it hits stable 144fps. If you look at the numbers, they don't really make any sense, it's supposedly 0.7ms CPU + 0.2ms GPU, yet for some reason there's a skipped 7ms frame?

The reason I started looking into this is that my own "game" where I've integrated this into, when I increase the stuff to draw enough, I fall down to 60fps while the in-game stats claim 2ms CPU time and 0.2ms GPU time. Could be possible that I just have flawed implementation or something. But then I was able to reproduce it in the linked demo as well. But I'm not exactly sure what's up, but I actually can't reproduce the numbers in my screenshot right now.

greggman commented 4 months ago

I see your point. The timing in the demo isn't the total time, it's the single render-pass time on the GPU. There's a bunch of other stuff happening that's not timed in the example. Allocating a canvas texture (getCurrentTexture), issuing the commands on the GPU (where as the timing is the execution time on the GPU), compositing it with the rest of the page, other book keeping, synchronisation for uploading the uniform data, etc...

On my M1 mac if I uncheck useRAF in the demo above I get 1900fps with 10k objects. On different OSes there's different overheads.

You might also try https://perfetto.dev/ for more info

All that said, if you think there's a bug in Chrome, file it at crbug.com

FINDarkside commented 4 months ago

There's a bunch of other stuff happening that's not timed in the example. Allocating a canvas texture (getCurrentTexture), issuing the commands on the GPU (where as the timing is the execution time on the GPU), compositing it with the rest of the page, other book keeping, synchronisation for uploading the uniform data, etc...

Right that makes sense. In my own game it seems like the problem is related to how I update uniforms and probably some other stuff that I do inefficiently which doesn't really get measured into neither CPU or GPU timings. Thanks for the insights, I understand better what it's measuring now!