gorbit99 / olcPGEX_Gamepad

Cross platform (Windows + Linux) Gamepad API for the Pixel Game Engine (http://onelonecoder.com/)
Other
15 stars 5 forks source link

Poor linux performance due to updating gamepads every frame #4

Closed PythonJedi closed 3 years ago

PythonJedi commented 3 years ago

I decided to start playing with PGE this week, and gamepad support is a must-have for these Halo-trained hands. However, when I ran the example code for this extension, I got 20-22 FPS, which is concerning for something that's not doing any heavy calculation. The pixel game engine's colored noise example runs happily at 700+ FPS.

Long story short, I finally noticed that on Windows, updateGamepads() is a no-op (not sure how that works, don't really care), but on Linux updateGamepads() enumerates each input device by ID and tries to open it as a gamepad. While it does ensure we find all the gamepads as soon as they're connected, that much I/O activity is a massive slowdown, especially as each mouse and keyboard are opened and asked if they're a gamepad on every frame, as well as any uninitialized gamepads.

I tested this hypothesis by making updateGamepads() a public method, not calling it in OnBeforeUserUpdate() and instead calling it from the example right before selectWithButton() if the user hasn't selected a gamepad yet, but skipping it once a gamepad has been selected. Oddly enough, before a gamepad gets selected, the FPS is actually at 35 with this change, even though the same logic is happening, theoretically. Once the gamepad is selected, the application will either hit the refresh rate (60FPS for me), or with vsync_mode=0, 1000+ FPS.

I'm not afraid of digging into the libinput guts (or iwatch, perhaps?) to find a better way to automatically detect gamepads being plugged in, but I've got other things I want to play with and I know the input system on linux is a massive rabbit hole, so I'm just going to mess about with my sloppy fix for the time being.

gorbit99 commented 3 years ago

Make sure you compile -O3 :)

Just in case this is an issue, I'm currently pushing the emscripten capable version of the library, in case there's some odd change I have on local that fixes this, but ye, debug mode performance is bad. 38 fps vs 2500 with any sort of optimizations enabled.

PythonJedi commented 3 years ago

I'm sorry, what kind of linux system are you testing on? WSL?

"Just compile with -O3" 1) doesn't work (gain of 20 FPS, sure, but that still barely gets me to 50 FPS, not multiple hundreds like I showed is possible), and 2) is somewhat insulting. I apologize for being so blunt, but I clearly stated that the performance issue here is that the gamepad extension is doing unnecessary I/O every frame, and demonstrated that fact by skipping the I/O when unnecessary and getting the kind of performance expected for such a simple program.

I reiterate: enumerating a directory is slow, opening a file is slow, reading, writing, and closing files is slow, all because we have to context-switch into the kernel for every operation and possibly transfer data between them, and compiler optimizations can't do a thing to help with that. It's best to not do I/O unnecessarily, yet this extension does all of those operations at least once per frame as long as there's something other than a gamepad udev decides is an "input", which anything from keyboards and mice to laptop lid switches. I guess I'll fix it 'properly' myself.

gorbit99 commented 3 years ago

Just a quick question out of pure curiosity, do you have your system on an HDD or an SSD?

I certainly am trying my application on native linux and frankly I can't find a single difference between your pull request and my original version in the fps department, but my system is on an SSD, so it could be that simply that is causing the difference.

Moros1138 commented 3 years ago

I can confirm the issue, had similar results, although compiling with -O3 did have a bigger impact for my system. However, when I tried the proposed changes it was like night and day in terms of performance.

What was 20 to 30 fps without optimization flags, became around 1500 fps with -O3

When I tested the inotify version my results were around 1500 fps without optimization flags, and that doubled to around 3000 fps with -O3

There are definite performance benefits to the inotify approach.

And now for some system info:

Xubuntu 20.04 running in a QEMU VM, on an UNRAID host machine.

My Host Machine:

Intel Core i7 7700K, clocked at 4.2Ghz
32GB DDR4 2133 MHz RAM
NVIDIA GeForce GTX1060 with 6GB DDR5
Intel Onboard Graphics

NVMe SSD from Samsung EVO
Several SSD in a cache pool
Several spinning disk drives for data storage array

My Virtual Machine

Intel Core i7 7700K, 4.2Ghz, (6 Cores allocated)
16GB DDR4 2133 MHz RAM allocated
NVIDIA GeForce GTX1060 with 6GB DDR5 (passthrough)
NVMe SSD from Samsung EVO (passthrough)
gorbit99 commented 3 years ago

Merged the PR, should be fixed