philkr / gamehook_gtav

GTA V plugin for gamehook
BSD 2-Clause "Simplified" License
42 stars 7 forks source link

slow capture due to file writing #6

Closed el3ment closed 6 years ago

el3ment commented 6 years ago

Although the bone matricies are one source of slowdown during capture, the other source is the readTarget function at the end of the frame which I assume is forcing a GPU sync. In other work, I usually have a pool of staging resources that I copy to during the Present call, and then Map after a few frames to avoid forcing the GPU to sync. I want to apply that here - but the way you have organized the D3D11 hook makes that difficult (it's hard to access the ID3D11Texture2D's of the targets, and it's not easy to get a ID3D11Device pointer or a ID3D11Context pointer. It feels like you have thought about these problems since you have staging textures in rendertarget.cpp... but it's pretty opaque and I'm having a hard time determining if every target we are saving is being handled properly.

I tried creating a pool for each target (so "final" and "final0" and "final1" etc) that I wanted to copyTarget(from=flow, to=flow0) but I get an error that I "Cannot copy output to custom render targets".

Any ideas on how to achieve this?

philkr commented 6 years ago

What's currently in there is the fastest of 3 implementations that I tried to speed this up. I never tried delaying reading beyond the current frame, as I was hoping to use this interface to play the game at some point. An earlier version of the capture plugin did delay reading for almost one frame, but I didn't see any performance gains.

The main reason you need a stage_tex_ is that DX11 doesn't allow you to read most resources directly to CPU (to avoid GPU sync). If you just want to see if it would be faster you can try making stage_tex_ an array, then copy and map different resources. This will give you a general sense for if it's faster, but will surely not produce a useful output.

If it's much faster I'll think about ways to incorporate this in the the API. The main thing that makes a clean implementation of this tricky is that MAP and UNMAP need to be called from the main thread and cannot be delayed through threading...

philkr commented 6 years ago

closing this for now, please reopen if you have an idea how to improve capturing.