Closed K0bin closed 2 years ago
Given that Mantle apparently also doesn't have uniform buffers, I'm not sure if this is worth it at all. Even if everything worked correctly, it would be way slower than DXVK on Nvidia hardware.
We'd have to patch shaders to turn buffers into uniform buffers when binding small, correctly aligned ranges of memory (no idea if that would even work) and I don't think anyone wants to implement that.
Any chance to get this revisited? My only (half recent) AMD card doesn't support the newer required Vulkan extensions.
I see your point regarding performance and raise your mine: preservation :books:
It's a very invasive and ugly change and I don't think @libcg is terribly excited about it. (which I can't blame him for.)
@sehraf You might want to look into NimeZ drivers if you're on Windows. Or use RADV on Linux, it's the only driver currently recommended for GRVK.
@K0bin It's a whole lot of code that I can't test or maintain.. Would be good to revisit when GRVK is mature. I appreciate the effort btw.
WSI Changes
General fixes
Nvidia workarounds
Running BF4 with GRVK ran into the following limitations:
So in order to make it all work, the PR does the following:
We expose the following heaps:
The host visible heaps basically work the same as before. grAllocateMemory will allocate a block of memory (using VMA in this case) that will get bound to an object later.
Device local heaps on the other hand will not allocate in grAllocateMemory. Instead we allocate in grBindObjectMemory when we know what kind of object it is in order to use the right memory type. It also brings back the old behavior of lazily creating Vulkan buffers for GrGpuMemory objects because otherwise memory usage would effectively double. It will still create a buffer in grAllocMemory for host visible heaps and when running on AMD GPUs.
When binding an image with the workaround activated (read: it has both an image and a buffer), we do one of the following things depending on the requested heap: If the heap is host visible, we bind the buffer to the memory, allocate a new chunk of memory specifically for the image (can even be device local) and bind that. If the heap is not host visible, we just allocate memory for the image and destroy the buffer that was created for that image (effectively turning it into a regular device local optimal image).