treeform / boxy

2D GPU rendering with a tiling atlas.
MIT License
102 stars 7 forks source link

CPU utilization #28

Closed chancyk closed 2 years ago

chancyk commented 2 years ago

I've been playing around with a demo based on the basic_windy example. So far I'm only drawing about 80 objects which are only based on 5 images. The draw loop seems locked at 60fps but I'm getting full utilization of the CPU core. I'd expect it to be idle for most of the 16ms frame time.

Might this be some kind of polling issue, possibly in boxy or windy?

chancyk commented 2 years ago

So I dug into this a bit further and initially traced it to this line:

But upon commenting out that line, it just moved to the next opengl call, so it seems to just be something related to the first interaction with opengl, or just something I don't understand. : )

guzba commented 2 years ago

CPU usage + vsync + SwapBuffers on Windows is a deep rabbit hole: https://www.google.com/search?hl=en&q=vsync%20swapbuffers%20cpu%20usage

Do you see a specific difference when running basic_windy before and after your changes? I expect you don't and CPU usage is high in both cases. This is just a basic example and not trying to be optimal.

Making it optimal is complicated--just one aspect you could consider is to see how much time you have before the next frame will be possible and sleep in increments approaching that or something. This'll have issues with potential missed frames though, or with not really knowing how long different monitor refresh rates are (60 75 90 120 144 165 so many now). I don't know enough about what you're trying to do etc. but that is a starting point You'll want to learn more about this yourself to really understand what is best for you.

(Another important aspect is not even calling swapBuffers unless you actually need to draw something new. Only games truly need to draw asap constantly. Most apps and other things can and should draw quite infrequently, only when something changes and the screen needs updating.)

chancyk commented 2 years ago

Aha, your google-fu revealed much better results! Definitely lots of interesting conversations around the busy waiting and vsync. There were some claims that it's just Windows misreporting, but it does seem to be actual CPU usage based on Core Temp and my fans turning on.

Sleeping to wait for the next frame does work though, and drops the CPU usage down to 1% from 15% for the core!

Thanks for the insights. I'll drop a few relevant links below for posterity's sake. The reddit link has this comment which may be interesting to explore in the future:

For Mac, there’s CVDisplayLink, which gives you a way to set a callback for when a vblank event happens.

For iOS, there’s a similarly named (but fairly different class interface version) called [CADisplayLink (https://developer.apple.com/documentation/quartzcore/cadisplaylink) which more or less does the same thing.

For Windows, based on my research so far it appears you’d have to hook into [IDXGIOutput::WaitForVBlank (https://docs.microsoft.com/en-us/windows/desktop/api/dxgi/nf-dxgi-idxgioutput-waitforvblank), which is part of DirectX, but should be able to be used simultaneously with an OpenGL context.

https://stackoverflow.com/questions/21925313/100-cpu-utilization-when-using-vsync-opengl https://stackoverflow.com/questions/62373762/high-cpu-usage-with-vsync-turned-on-in-opengl-and-sdl2-application https://www.reddit.com/r/opengl/comments/aif3pb/how_to_properly_sleep_for_vsync/

guzba commented 2 years ago

This is one of those issues where if you know the magic keywords you do get really useful search results. It sounds like you took quite an adventure down the rabbit hole haha. Yeah this is complicated and I'm certainly not an expert either. As we work to mature Windy and Boxy I do imagine this is an area we may be able to be clever and make things even nicer "out of the box". We'll see how the repos continue to evolve.