Closed astrotuna201 closed 4 years ago
Hi there, I can not provide a full answer right now as I am currently in the process of acquiring a macOS setup for tracing these cross platform/architecture discrepancies down. However, as you already noticed different cards comes with different memory types. The Vulkan specification states that there must be at least one memory type with
both VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
,
and one with VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
.
So, in principle vuda can not assume a memory type with the VK_MEMORY_PROPERTY_HOST_CACHED_BIT
. That is why vuda always tries to find a fallback candidate with HOST_VISIBLE+HOST_COHERENT. However, in the current implementation this fails, because vudaFindMemoryType() throws a runtime error instead of returning -1. The fetching function for findMemoryType_Cached() therefore never gets to try finding the fallback candidate. I will make sure that vuda allows this. This was an unfortunate bug introduced with the embedded kernel sample update.
However, defining eCachedProperties = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT
should allow the samples to execute (if not incorrectly). Thing is, if the memory type does not have the coherent flag, host management commands needs to be in place to flush memory ranges such that writes are visible for the device or the host, respectively. This might be why you see the transfers failing when redefining eCachedProperties and eCachedInternalProperties.
The memory allocation with the VK_MEMORY_PROPERTY_HOST_CACHED_BIT
is used for pageable transfers from device to host in order to gain comparable speeds with cuda (on nvidia architecture at least). You can test this on your Linux system – with and without. Also, weird thing is I seem to remember that the RX Vega’s 16 Gib Heap 1, had the HOST_VISIBLE, HOST_COHERENT, and HOST_CACHED flags at some point. Might be different on macOS or have changed.
Appreciate the feedback.
Should be fixed now. Let me know.
sorry, this is still not working, the available memory types are still:
memoryHeaps: count = 2 memoryHeaps[0]: size = 34342961152 (0x7ff000000) (31.98 GiB) budget = 34342961152 (0x7ff000000) (31.98 GiB) usage = 0 (0x00000000) (0.00 B) flags: count = 1 MEMORY_HEAP_DEVICE_LOCAL_BIT memoryHeaps[1]: size = 206158430208 (0x3000000000) (192.00 GiB) budget = 132674641920 (0x1ee4066000) (123.56 GiB) usage = 25579520 (0x01865000) (24.39 MiB) flags: count = 0 None memoryTypes: count = 3 memoryTypes[0]: heapIndex = 0 propertyFlags = 0x0001: count = 1 MEMORY_PROPERTY_DEVICE_LOCAL_BIT usable for: IMAGE_TILING_OPTIMAL: color images, FORMAT_D16_UNORM, FORMAT_D32_SFLOAT, FORMAT_S8_UINT, FORMAT_D24_UNORM_S8_UINT, FORMAT_D32_SFLOAT_S8_UINT IMAGE_TILING_LINEAR: None memoryTypes[1]: heapIndex = 1 propertyFlags = 0x0006: count = 2 MEMORY_PROPERTY_HOST_VISIBLE_BIT MEMORY_PROPERTY_HOST_COHERENT_BIT usable for: IMAGE_TILING_OPTIMAL: None IMAGE_TILING_LINEAR: None memoryTypes[2]: heapIndex = 0 propertyFlags = 0x000b: count = 3 MEMORY_PROPERTY_DEVICE_LOCAL_BIT MEMORY_PROPERTY_HOST_VISIBLE_BIT MEMORY_PROPERTY_HOST_CACHED_BIT usable for: IMAGE_TILING_OPTIMAL: color images IMAGE_TILING_LINEAR: None
hence the only HOST_CACHED memory type available is not HOST_COHERENT
so allocations of eCachedProperties or eCachedInternalProperties still fail
at
logical device.inl:233
host_cached_node_internal* dstptr = m_cachedBuffers.get_buffer(size, m_allocator);
in cachedbuffer.hpp:33
m_ptrMemBlock = allocator.allocate(vk::MemoryPropertyFlags(memoryPropertiesFlags::eCachedInternalProperties), size);
Perhaps the comment linked below might indicate how to fix this? It requires memory flush /invalidate calls.
https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/693#issuecomment-463961366
Hi again and thank you for pursuing the error.
I was finally able to get a real repro case running. The correct invalidate and flushes should be in place now for the HOST_CACHED (non-coherent) memory type and the samples should compile.
Please let me know. Cheers.
Thanks, that seems to fix compilation! In the bandwidthtest.cpp file I had to adjust the memory allocation like the one used in events_and_bandwith.cpp to pass the memory copy test, though. Many thanks!
on MoltenVK & macOS (Vulkan Instance Version: 1.2.131),
Vuda in types.h currently defines
The sample codes are running fine on a Linux machine with an NVIDIA Quadro 6000 RTX, which provides a suitable memory type:
but for the two devices I can test the sample code with, the returned possible memoryTypes are (1) AMD Radeon R9 M370X (and similar for AMD Vega II):
i.e., I can have CACHED, or COHERENT but not both. However, this is required by Vuda in inc/state/memoryallocator.h:
by referencing eCachedProperties. The result is throw std::runtime_error("vuda: failed to find suitable memory type !"); as the memory requirement is not fulfilled.
The bandwidth sample code compiles if I re-define
but then the output is
Is this an incomplete / wrong implementation in MoltenVK /macOS, or a bug in Vuda?