Open jc4p opened 8 years ago
Currently I have a Task
in DirectXManager
that is constantly polling for GPU updates, that's here. When this always looping task gets a valid frame and reads it, it calls a delegate on the form which does the processing on the Bitmap.
Right now, this is wasteful because DirectXManager routinely emits events that I ignore (the !isProcessing
check in the if statement), it's recording and emitting frames from the screen at ~90fps in my local testing. That's a lot of wasted GPU and CPU operations.
I would rather have it that when we have a successful frame grab we don't grab another until processing is done. This might reduce fidelity a bit from OCR failures (if we're grabbing at 90fps missing reading a couple frames doesn't lose any data, but if we're grabbing sparsely it might) but I'm okay with it. The problem is, how should it be implemented?
The codebase I initially wrote was actually using a BlockingCollection
to rate-limit and only do screengrabs when the processing pool had the space for it. This worked pretty well, it's a nice simple solution. I could replace my delegate/callback system with a ConcurrentQueue that both DirectXManager and the Form can access. I'm not quite sure what the drawbacks here are.
The "try to grab a frame from the screen, if you can translate it to a Bitmap quickly and bring it to me" step DirectXManager does sounds like a good candidate for an async method. I could simplify all of this to a infinite looping method in the consumer which calls that method, processes, sleeps for a bit maybe, then loops.
The possible drawback here is I don't understand the async concurrency model very well, I don't know how easily a situation like "oh, that attempt failed after a 10s timeout on the GPU side but maybe it'll work next round" would be handled here. Previous async/await experiences have led to a lot of race conditions and a lot of headaches with inner-exceptions.
I kind of want to just go to ConcurrentQueue because I know it will work and I can write the code, but I also would like to learn more about async/await and this seems like a good opportunity. Are there any benefits/drawbacks I'm missing?
Took the blocking collection route, first pass: https://github.com/jc4p/lol-mechanics-tracker/commit/1b0f2ef9cfaf3bdaae7d58eb05169c7817f1de5b
It's still noticeable when recording though, even if I throw that delay (currently 2s) up super high. If I run with v-sync on the game goes from 60ps to ~45fps when I start tracking. Task Manager reports a lot of CPU usage from DWM itself.
Either some of my DirectX callback code is directly messing with the actual LoL DirectX threads (maybe I need to clone the data before I queue it?) or the duplicate desktop API being attached is just high CPU. I need to benchmark against stubbing out my code (to not do anything at all in my callbacks), then that version of my code vs the equivalent in C++ (which I already have from initial testing).
I nulled out all the C# code, just left the SharpDX logic for acquiring frame and immediately releasing it, still causes LoL to drop to ~35-40fps on my machine.
Tested out a C++ version based off this initialization which just infinitely acquires frames and doesn't do anything before releasing them, still experiencing 35fps-40fps frame drops in LoL.
I need to figure out how if I'm doing something wrong with the DXGI setup or if this just isn't going to work for me.
For reference, OBS's code for the same: https://github.com/jp9000/obs-studio/blob/master/libobs-d3d11/d3d11-duplicator.cpp#L173 -- I don't know how to force OBS into using DWM recording to compare a local benchmark though. It looks like OBS might default to something else (a direct3d9 API) then DXGI if that's not around: https://github.com/jp9000/obs-studio/blob/1e056fd7ecd1ab5474dad05c5b641a6e86efe0f2/plugins/win-capture/graphics-hook/graphics-hook.c#L275
I found this other DXGI Streamer and ran it, nulling out the code between acquiring a frame and releasing it, same ~35-40fps in LoL when I run it, same high CPU usage from DWM in task manager.
Alright I finally did what I should've done to begin with and tested out some different capturing modes in OBS. Using OBS 64bit if I preview or record with "Monitor Capture" I see the same DWM CPU usage in task manager and a FPS drop, although it feels a lot less choppier than with others I've tested, and LoL sticks to right at 30fps instead of jumping around. If I preview or record with "Game Capture" I get 60fps gameplay in LoL alongside seemingly good video output from OBS. Using OBS's Game Capture and looking at task manager, it seems to be doing a lot of CPU work and DWM isn't doing anything.
https://jp9000.github.io/OBS/general/whatcapture.html says "Monitor Capture" is supposed to be fast and low CPU in Windows 8. Weird.
I went through all my pending Windows updates, manually unchecking ones that were on Windows 10 nag lists on blogs, and now my code and OBS 64bit in "Monitor Capture" both cause LoL to go down to ~50fps from 60fps. It's definitely a lot smoother experience too.
I'm pretty sure the OBS "Game Capture" is hooking into Direct3D/OpenGL and intercepting the render calls, which seems a lot more work/risk. :/
I started working on Direct3D9/Direct3D11 injecting, but running EasyHook to do the injecting immediately makes Overwatch freak out. I'm a bit worried if I go down this path I won't be able to use the same lib for the apps I want to make for Overwatch/other-games, without being banned :/
Right now there's two big questionable tactics when it comes to the CS tracking's performance.
Out of these, the second is a pre-mature optimization at this stage, the first is a mustfix.
Todo for 0.1:
Punted till post 0.1: