not-fl3 / miniquad

Cross platform rendering in Rust
Apache License 2.0
1.54k stars 173 forks source link

MacOS CPU usage increased by 8x in v0.4.4 #470

Closed brettchalupa closed 1 month ago

brettchalupa commented 2 months ago

I noticed running a simple Macroquad example on MacOS 14.5 with an Apple M2 Pro chip was using a high amount of CPU. I checked out some previous versions of Macroquad and saw the performance was much better in older versions. Using git bisect, I landed on https://github.com/not-fl3/macroquad/commit/93b8af24fd99d8a8f2cd4e2446260a43df36e1d6 being the introduction of the high CPU usage, which contains a patch level upgrade for miniquad from 0.4.2 to 0.4.3 (which was yanked) and macroquad_macro from 0.1.7 to 0.1.8.

So I ran a git bisect on Miniquad, using cargo run --release --example triangle as my test for CPU usage. Some examples:

Through the bisect, I landed on and confirmed that this commit https://github.com/not-fl3/miniquad/commit/833799ec32883cdd7465a5f8ddc80e2203dbcbc4 increases CPU load on MacOS by about 8x. First noticeable with the v0.4.4 release of Miniquad.

Repro steps

  1. Clone the repo
  2. Checkout the commit just before: git checkout 833799ec32883cdd7465a5f8ddc80e2203dbcbc4^1
  3. Run cargo run --release --example triangle to see CPU performance baseline
  4. Checkout the high CPU usage commit: git checkout 833799ec32883cdd7465a5f8ddc80e2203dbcbc4
  5. Run cargo run --release --example triangle to see CPU performance degrade

Additional info

Happy to help test or debug, especially if access to MacOS is limited. @birhburh pinging you here too since you authored the commit. Let me know if I can be supportive in any way. Thanks!

birhburh commented 2 months ago

@brettchalupa, Thanks! Yes, checked and this is also true for me on Intel I'll look into this

VanjaRo commented 1 month ago

It seems that changin the config field blocking_event_loopto true for the macos fixes the problem. Maybe run method of the NSApplication class was utilizing the same mecanism under the hood. Or maybe not. As for the dynamic quad example the problem persists where the blocking... config value is obviously not the solution.

VanjaRo commented 1 month ago

Another guess that appeared while searching for a similar even_loop is potentially unlimited fps for the application which leeds to many useless iterations.

birhburh commented 1 month ago

@VanjaRo, thanks for the link! I look into this approach if my solution will not work Now I'm trying to implement own nsview with opengl support as it was done in glfw It uses flushBuffer method that should use less cpu, because it syncs to screen refresh rate https://developer.apple.com/documentation/appkit/nsopenglcontext/1436211-flushbuffer?language=objc But I didn't spent much time during week And somehow basic test with this approach redraws only background and not geometry, so I'm still debugging

birhburh commented 1 month ago

https://github.com/birhburh/miniquad/tree/macos_prototype Everything should work now with opengl/metal backend: low cpu usage as before, even resize If you can, please test it on arm apple laptops ;-) Not making PR yet though Need to fix metal backend working with macroquad (again draws quarter of image) And also test blocking event loop now And do some cleanup

brettchalupa commented 1 month ago

@birhburh I tested out your macos_prototype branch on your fork on my M2 Pro chip, and when I run cargo run --release --example triangle there's a segfault crash:

Translated Report from crash ``` ------------------------------------- Translated Report (Full Report Below) ------------------------------------- Process: triangle [47308] Path: /Users/USER/*/triangle Identifier: triangle Version: ??? Code Type: ARM-64 (Native) Parent Process: zsh [47022] User ID: 501 Date/Time: 2024-08-12 10:37:38.6855 -0400 OS Version: macOS 14.5 (23F79) Report Version: 12 Anonymous UUID: A67C31B0-B597-DA45-EC6F-2A73A4792CC2 Sleep/Wake UUID: B9E3C94B-1877-4E66-80F8-7C52E1748C9E Time Awake Since Boot: 2600000 seconds Time Since Wake: 6974 seconds System Integrity Protection: enabled Crashed Thread: 0 main Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000001 Exception Codes: 0x0000000000000001, 0x0000000000000001 Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11 Terminating Process: exc handler [47308] VM Region Info: 0x1 is not in any region. Bytes before following region: 4376936447 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL UNUSED SPACE AT START ---> __TEXT 104e2c000-104e88000 [ 368K] r-x/r-x SM=COW /Users/USER/*/triangle Thread 0 Crashed:: main Dispatch queue: com.apple.main-thread 0 libobjc.A.dylib 0x184eb7fb4 objc_retain + 8 1 Foundation 0x1864be0f0 -[NSCFTimer initWithFireDate:interval:target:selector:userInfo:repeats:] + 184 2 Foundation 0x1864bdf04 +[NSTimer(NSTimer) timerWithTimeInterval:target:selector:userInfo:repeats:] + 104 3 triangle 0x104e32e8c 0x104e2c000 + 28300 4 triangle 0x104e30550 0x104e2c000 + 17744 5 triangle 0x104e34d38 0x104e2c000 + 36152 6 dyld 0x184f060e0 start + 2360 Thread 1: 0 libsystem_pthread.dylib 0x185289d20 start_wqthread + 0 Thread 2: 0 libsystem_pthread.dylib 0x185289d20 start_wqthread + 0 Thread 0 crashed with ARM Thread State (64-bit): x0: 0x0000000000000001 x1: 0x1800600002199d17 x2: 0x0000000000000020 x3: 0x0000000000000001 x4: 0x0000000000000005 x5: 0x0000000020200000 x6: 0x0000000000000001 x7: 0x0000000000000960 x8: 0x0000000186def000 x9: 0x0000600002f88000 x10: 0x0000000000000700 x11: 0x0000000000000020 x12: 0x0000000000000001 x13: 0x00000000fffffe38 x14: 0x00000000000007fb x15: 0x00000000a00e3ffb x16: 0x0000000184eb7fac x17: 0x00000001eebf13b0 x18: 0x0000000000000000 x19: 0x0000000000000001 x20: 0x0000000000000001 x21: 0x00000001e59f6690 x22: 0x509b000129127650 x23: 0xb8c37c692394fd82 x24: 0x0000600002f88700 x25: 0x0000000000000000 x26: 0x0000000000000001 x27: 0x000000016afcd048 x28: 0x0000600002fa87e8 fp: 0x000000016afc7ba0 lr: 0x00000001864be0f0 sp: 0x000000016afc7b30 pc: 0x0000000184eb7fb4 cpsr: 0x00001000 far: 0x0000000000000001 esr: 0x92000006 (Data Abort) byte read Translation fault Binary Images: 0x107cec000 - 0x107d57fff com.apple.AppleMetalOpenGLRenderer (1.0) /System/Library/Extensions/AppleMetalOpenGLRenderer.bundle/Contents/MacOS/AppleMetalOpenGLRenderer 0x1051bc000 - 0x1051c7fff libobjc-trampolines.dylib (*) <9381bd6d-84a5-3c72-b3b8-88428afa4782> /usr/lib/libobjc-trampolines.dylib 0x104e2c000 - 0x104e87fff triangle (*) /Users/USER/*/triangle 0x184eb0000 - 0x184effd83 libobjc.A.dylib (*) /usr/lib/libobjc.A.dylib 0x186445000 - 0x1870a2fff com.apple.Foundation (6.9) <99e0292d-7873-3968-9c9c-5955638689a5> /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation 0x184f00000 - 0x184f88a17 dyld (*) <37bbc384-0755-31c7-a808-0ed49e44dd8e> /usr/lib/dyld 0x0 - 0xffffffffffffffff ??? (*) <00000000-0000-0000-0000-000000000000> ??? 0x185288000 - 0x185294fff libsystem_pthread.dylib (*) <386b0fc1-7873-3328-8e71-43269fd1b2c7> /usr/lib/system/libsystem_pthread.dylib External Modification Summary: Calls made by other processes targeting this process: task_for_pid: 0 thread_create: 0 thread_set_state: 0 Calls made by this process: task_for_pid: 0 thread_create: 0 thread_set_state: 0 Calls made by all processes on this machine: task_for_pid: 19 thread_create: 0 thread_set_state: 7 VM Region Summary: ReadOnly portion of Libraries: Total=930.4M resident=0K(0%) swapped_out_or_unallocated=930.4M(100%) Writable regions: Total=1.1G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.1G(100%) VIRTUAL REGION REGION TYPE SIZE COUNT (non-coalesced) =========== ======= ======= Accelerate framework 256K 2 Activity Tracing 256K 1 CG image 144K 1 ColorSync 576K 28 CoreAnimation 16K 1 CoreGraphics 16K 1 Foundation 16K 1 Kernel Alloc Once 32K 1 MALLOC 1.1G 47 MALLOC guard page 192K 12 STACK GUARD 32K 2 Stack 9248K 3 Stack Guard 56.0M 1 VM_ALLOCATE 560K 13 __AUTH 1119K 229 __AUTH_CONST 18.4M 396 __CTF 824 1 __DATA 6148K 384 __DATA_CONST 20.4M 401 __DATA_DIRTY 1079K 136 __FONT_DATA 2352 1 __GLSLBUILTINS 5174K 1 __LINKEDIT 533.1M 4 __OBJC_RO 71.9M 1 __OBJC_RW 2199K 1 __TEXT 397.3M 414 dyld private memory 272K 2 mapped file 75.2M 17 shared memory 864K 14 =========== ======= ======= TOTAL 2.3G 2116 ```

Let me know if there's more useful info to provide.

Edit: also seeing a segfault on the latest master in miniquad with this PR merging https://github.com/not-fl3/miniquad/pull/475#issuecomment-2284181822 (commit: 30b4e17ece36d93988e65d8e57227c21f62b4002)

brettchalupa commented 1 month ago

@birhburh given your PRs have addressed things, is this good to close or is there more to be done here? Awesome work!

birhburh commented 1 month ago

@brettchalupa, thanks, i think it can be closed

not-fl3 commented 1 month ago

:tada:

Great job @birhburh !