Closed NickThissen closed 1 year ago
One thing I forgot: I realize the more usual way to use a buffer is to just keep two staging textures and cycle between them. One is being used in the background by the GPU to copy, while the other one should be "ready to go" and I can access the data. Each iteration (frame arrived) I swap them over.
However, I did not manage to make this work, because I don't understand how I can guarantee that the texture will be ready by the time the next frame arrives. And if it isn't ready, then I have no data at all. I started experimenting with 3 or even 4 textures in the buffer but in the end I decided keeping a list of arbitrary size (but always removing the unused ones) was the better choice...
This is more of a D3D11 usage question, you'll get better answers from that community. The DirectX folks have set up a Discord server you can join: https://devblogs.microsoft.com/directx/hello-discord/
As for your question, I would use a collection of staging texture with two concurrent queues while moving the encoder/sender to some other thread. I'll refer to the two queues as the 'free' queue and the 'busy' queue. When you receive a frame, pull a staging texture out of the free queue and copy the frame into it. Then put it on the busy queue.
Later, the encoder/sender thread will pull off of the busy queue, map the texture, and then send the bytes across the wire. When it's done, it'll put that texture back onto the free queue.
But this will all be contingent on what scenario you're after, and how you encode and send data. I highly recommend identifying metrics you can measure and profiling your application.
I'm going to close this issue, as it's outside the scope of these samples. Good luck!
Hi guys, any progress?
Hello,
I am trying to get access to the byte data of the screen capture textures from the WPF Screen Capture sample, similar to #78. I am running the screen capture in a continuous loop, receiving every frame (144 times per second if my monitor refresh rate is 144 Hz). My goal is to send these 144 frames of "video" (or image data) over the network. I use NDI to send the frames, and all I need is a memory address where the data is stored (and CPU can access it).
I have previously solved this problem with the help of some people here via the following steps: Every time a new frame arrives:
Some sample code:
Create the staging texture:
MapSubresource and sending the data:
I guess making a new staging texture for every frame is unnecessary, but I found there to be no measurable performance difference if I re-use one.
This works great, but I believe it is still not optimal because of the "MapFlags.None". My understanding is that this makes the call wait for the GPU to finish copying to the CPU. While it does not take a massive amount of time, it is still something that is causing an unnecessary delay: the CPU is "busy" (doing nothing) waiting for the frame to arrive. This causes the CPU usage to be high and other applications start slowing down because of it.
My goal is to achieve the same performance (144 Hz sending) but with lower CPU usage. I believe the key is to use MapFlags.DoNotWait instead. This will cause the MapSubresource call to return immediately and the CPU is not left waiting. However, the data isn't available yet, so I have to do something else to get to the data, sometime later.
What I came up with is some kind of (probably terrible) buffering system. The logic would be as follows:
While this works decently well, I don't see a great improvement to the CPU usage yet. A bigger problem is that I am no longer calling UnmapSubresource on any of these textures anymore. The moment I try it (anywhere), I get an 'access denied' error. However it seems to run OK without using UnmapSubresource at all, I don't see any memory build up.
I'm sure I am still doing something wrong and this can be optimized better. Does anyone have any tips on how I can achieve it?
One potentially important note; I am omitting the "_sender" code here, but it essentially keeps a queue of frames and sends them off at the desired framerate in a background thread (which blocks for the appropriate amount of time to reach the desired sending frame rate). Steps are the following: