keijiro / KlakSpout

Spout plugin for Unity
The Unlicense
651 stars 97 forks source link

DX12 Support #69

Closed Ohmnivore closed 3 years ago

Ohmnivore commented 3 years ago

Features

Overview

DX11 Sender

Same as the current version in terms of graphic API calls.

DX11 Receiver

Now performs an extra CopyResource operation that the current version doesn't do.

To get rid of it, code paths for DX11/DX12 would need to be split on the C# side which I decided against for simplicity's sake. But it's on the table.

DX12 Sender & Receiver

Same as the DX11 code, with a few differences:

References Used

Temporary Render Targets

To support temporary RTs, the PR does two things:

  1. Maintains an eviction cache to approximate the lifetime of the textures, since in the case of temporary RTs we don't know their lifetimes.
  2. Uses IssuePluginCustomBlit to pass textures from Unity to the plugin so that the temporary RT native texture pointers can be resolved. We can't obtain the native texture pointers of CommandBuffer temporary RTs on the C# side.

If support of temporary RTs in DX12 is deemed unnecessary, these two things can be removed, simplifying the plugin somewhat. But this option would also have a gotcha (can elaborate).

Eviction Cache

If Unity releases a texture before our cache evicts it, it will log a scary warning (not in the console, only in the log file) because its ref count is still not zero.

d3d12: releasing a resource which is still being referenced. It will be leaked.

That's fine - our cache will evict and fully release the texture moments later.

The cache could be made more robust to handle these cases (ex subscribing to play/stop/sceneloaded events to evict all its contents).

The warning message would also go away if DX12 temporary RT support is removed from the PR.

Tested

Performance Tests

Render Thread stats are from the Stats overlay. GPU stats are from running the Profiler with the GPU Usage module (and reading the total GPU frame time).

Stress scene

Disabled scene reloading to get a stable reading.

DX11 (master branch)    7.2 ms Render Thread    10.0 ms GPU
DX11 (this PR)          7.5 ms Render Thread    10.0 ms GPU
DX12 (this PR)         18.2 ms Render Thread    20.0 ms GPU

If I keep the scene reloading, then I can estimate that scene load time between DX11 (master branch) and DX11 (this PR) is roughly the same, while DX12 (this PR) is two or three times slower.

Quad scene

In the receiver object, receiving a 1080p texture from another app.

DX11 (master branch)    2.0 ms Render Thread    1.2 ms GPU
DX11 (this PR)          2.0 ms Render Thread    1.5 ms GPU
DX12 (this PR)          2.0 ms Render Thread    3.0 ms GPU

Conclusion

I haven't been able to figure out why performance on the DX12 path is x2 worse. There's for sure a cost associated with the D3D11on12 layer but I don't know if that explains everything.