phr00t / FocusEngine

Focus Game Engine. This is Stride/Xenko fast-tracked for Phr00t's Software games. Improvements over the original Focus on Vulkan support, PC platforms, VR, performance & ease. Cherry-picks commits from other forks as needed.
MIT License
97 stars 11 forks source link

Poor Vulkan Performance #58

Closed phr00t closed 5 years ago

phr00t commented 5 years ago

Vulkan is suppose to be fast and lightweight, but comparing the same Xenko application built for DirectX & Vulkan -- Vulkan is significantly slower. The moment multiple objects appear on the screen in a project I'm working on, frame rate drops significantly with Vulkan. I don't see the same frame rate drops with DirectX 11.

tebjan commented 5 years ago

you don't get any additional performance for free with vulkan. this is a common misconception. if you set it up like a dx11 pipeline its also the same performance. getting faster rendering with vulkan requires deep understanding of the API and the drawing you want to do and then optimize it for multi threading (multiple command buffers). i think i saw that there was a multi thread optimization for VR in xenko, but haven't seen that at other places. so for most part of the rendering there is no real performance advantage for using vulkan, except (and that's a big one) that it works on all platforms.

phr00t commented 5 years ago

@tebjan I wasn't expecting performance improvements "for free" with Vulkan, I was expecting the implementation wouldn't be setup like DirectX 11, since it isn't DirectX 11. Unfortunately, Xenko isn't even getting DirectX performance with Vulkan -- but much worse. I made too many assumptions on the Vulkan Xenko implementation... I thought it just had a few bugs that needed fixing, but the problems may go much deeper.

phr00t commented 5 years ago

In a strange turn of events, running through very similar frames in DirectX and Vulkan shows the Vulkan frame actually rendered faster. DirectX was ~1800 us, while Vulkan was ~1200 us to render a frame. Both DirectX and Vulkan started @ 120 FPS, but as object count grew, Vulkan dropped to about ~55 FPS while DirectX was around 115 FPS when the above timing per frame was taken. So, something else must be causing the FPS drop than raw frame times...

phr00t commented 5 years ago

My GTX 1070 is being utilized by my sample project by about 13% & CPU utilized about 27%. No core is ever going above about 50% for total system usage.

The Vulkan API isn't giving enough work to the GPU or CPU, and/or is spending too much time sitting idle....

phr00t commented 5 years ago

In my test application, over 50% of time is spent waiting on this lock:

https://github.com/phr00t/xenko/blob/master/sources/engine/Xenko.Rendering/Rendering/Lights/LightSpotGroupRenderer.cs#L400

This slow spot doesn't seem specific to DirectX, although fixing it should improve performance for both APIs.

phr00t commented 5 years ago

Identified a specific problem with Vulkan performance -- this line:

https://github.com/phr00t/xenko/blob/master/sources/engine/Xenko.Graphics/Vulkan/CommandList.Vulkan.cs#L267

GraphicsDevice.DescriptorPools.GetObject(); is taking up ~36% of CPU processing time in my test program alone. Most of the time spent is waiting on this lock:

https://github.com/phr00t/xenko/blob/master/sources/engine/Xenko.Graphics/Vulkan/GraphicsDevice.Vulkan.cs#L671

phr00t commented 5 years ago

https://github.com/phr00t/xenko/commit/de94aabe6ef0ad7e1aef30e46a507587d86a0e5a

Helps a ton to fix the GetObject() slowdown.

Setting the following two options helps further, and I will set these by default in a future commit:

game.IsDrawDesynchronized = true; game.IsFixedTimeStep = false;

However, performance still lags behind DirectX 11... still investigating...

phr00t commented 5 years ago

I've noticed that RenderDoc, what I've been using to track performance, significantly slows down only the Vulkan rendering API. This significantly skews the results showing Vulkan running much slower than it actually does. Vulkan seems to be running pretty well, actually, when I track FPS via a custom counter.

However, DirectX seems to hit my 120hz panel refresh rate, while Vulkan seems locked at 60hz. I see significant drops from 120 FPS with DirectX when objects enter the scene (to about 85 FPS), while Vulkan stays pegged at 60 FPS.

phr00t commented 5 years ago

I was able to "unlock" Vulkan from 60 FPS by switching the swap chain's present mode to MAILBOX (instead of fifo). Unfortunately, frame rates don't get up to or match DirectX ones.

I highly suspect the Vulkan API issues more parallel "tasks" for the Xenko.Threading system to process, which gums it up. DirectX doesn't do as much parallel work, so it doesn't pack on the Xenko.Threading system as much -- leading to less overhead. Xenko.Threading needs to be streamlined, or Vulkan needs to use another stripped-down threading system.

When Vulkan API is used, much more time is spent in ProcessWorkItems, yet less time is spent in Draw calls.

phr00t commented 5 years ago

This commit significantly improves the threading system: https://github.com/phr00t/xenko/commit/f7eeb20efc6272d0f2d8bba938809420b1704c74

... which helps everything, including DirectX. So, DirectX still appears to be a bit faster, but it seems Vulkan isn't being held back significantly from the threading system anymore.

phr00t commented 5 years ago

This is still a significant issue.

In my latest (simple) test case, Vulkan was getting around ~87 FPS. DirectX was getting ~130 FPS with the same scene.

phr00t commented 5 years ago

Deep analysis is showing Vulkan spends significantly more time in EndDraw functions than DirectX... investigating...

EDIT: Seems to be caused by a bunch of time spent in Vulkan Present(), whilenext to no time is spent in DirectX Present().

phr00t commented 5 years ago

Asking for help on the Vulkan reddit thread: https://www.reddit.com/r/vulkan/comments/bif0hi/slow_vkqueuepresentkhr_performance/

phr00t commented 5 years ago

Fixed with https://github.com/phr00t/xenko/commit/4e1b502baae7cadd33f86ce2bc40b0c3ff92a8f3 (put Present in its own thread)