vosen / ZLUDA

CUDA on non-NVIDIA GPUs
https://vosen.github.io/ZLUDA/
Apache License 2.0
9k stars 600 forks source link

DLSS Support Outline #87

Closed Weather-OS closed 1 month ago

Weather-OS commented 7 months ago

What would it realistically take to implement support for DLSS within this project? Could we make use of DXVK-NVAPI for such feature?

This is a wonderful project and I’m looking forward to contributing some of my own work to it.

Mushoz commented 7 months ago

I am very curious about this question as well. Especially since the README contains the following text:

Realistically, it's now abandoned and will only possibly receive updates to run workloads I am personally interested in (DLSS).

Being able to run DLSS on AMD hardware would be a game-changer for sure, and I am curious what it will take to get there. Amazing project, and I am hoping this won't die now that funding has stopped.

Weather-OS commented 7 months ago

Attempt no.1 (Base ZLUDA build)

Details: DXVK-NVAPI Requires patches in order to bypass AMD GPU Detection, The rest of the libraries work fine. GPU Arch is set to NV_GPU_ARCHITECTURE_TU100.

Observation: The game(s) fail at NVSDK_NGX_D3D12_GetFeatureRequirements. 2 Examples are provided (Minecraft RTX, R&C Rift Apart). Although preliminary initialization does succeed.

image

image

vosen commented 7 months ago

I did not take notes last time I poked there, so it might not be 100% accurate, but:

opcod3 commented 6 months ago

Regarding the Vulkan interop it seems hip hould support it, Although I'm not sure if a kernel can be bound to an arbitrary command list as you specify. See example here

D3D12 interop should also be available on windows. There are no HIP samples but the recently released Orochi 2.0 does contain a DX12 sample: here

opcod3 commented 6 months ago

I ran some tests and want to report my findings which are very bare-bones.

I tried to run the DLSS sample that was linked on an AMD gpu on linux. The first issue is that vulkan will require a few extensions that AMD GPUs do not have:

I patched the source to bypass those checks, along with some other changes:

After this I finally got the sample running, sadly I did not get any calls to NVApi. This is likely because NGX does a feature check and refuses to initialize on unsupported GPUs. At this point I have some questions:

I think there are three options:

Update:

Compiling mesa from source was surprisingly easy. I updated radv to report it supports the two extensions mentioned above. I also figured out on linux the architecture check is done using libnvidia-ml instead of NVAPI. I updated the zluda implementation of that lib to report my card as turing.

This got me further but DLSS support is still not being reported. I feel NVSDK_NGX_VULKAN_GetFeatureRequirements is what's locking up the system. I'll try to find a way to enable some more logging in NGX, maybe it will shed some light on what's going wrong.

Update 2:

After running the demo with logging by using the __NGX_LOG_LEVEL=10 environment variable I was able to pinpoint why the application is failing.

These are the relevant snippets:

[tid:124644461002368][LoadVkEntrypoints:95] Failed to load VK device function: vkCreateCuModuleNVX
[tid:124644461002368][LoadVkEntrypoints:95] Failed to load VK device function: vkCreateCuFunctionNVX
[tid:124644461002368][LoadVkEntrypoints:95] Failed to load VK device function: vkDestroyCuModuleNVX
[tid:124644461002368][LoadVkEntrypoints:95] Failed to load VK device function: vkDestroyCuFunctionNVX
[tid:124644461002368][LoadVkEntrypoints:95] Failed to load VK device function: vkCmdCuLaunchKernelNVX
[tid:124644461002368][LoadVkEntrypoints:95] Failed to load VK device function: vkGetImageViewHandleNVX
[tid:124644461002368][LoadVkEntrypoints:95] Failed to load VK device function: vkGetImageViewAddressNVX
[tid:124644461002368][CreateKernel:388] Error: vkCreateCuModuleNVX() and/or vkCreateCuFunctionNVX() functions are not available. Request VK_NVX_binary_import extension

This brings us to a point where we have to implement these functions into radv. Before doing that I thought it would be wise to test the Vulkan/HIP interop example I linked to above. And sadly after compiling the program crashes with the following error:

 hipVulkan: /usr/src/debug/hip-runtime-amd/clr-rocm-6.0.2/rocclr/device/rocm/rocdevice.hpp:241: virtual bool roc::NullDevice::importExtSemaphore(void**, const amd::Os::FileDesc&, amd::ExternalSemaphoreHandleType): Assertion `false && "ShouldNotReachHere()"' failed.

It appears hipImportExternalSemaphore and a few other functions are not implemented in the rocm clr backend but only in the PAL one. Which I was unable to easily compile, so it will be for another day. I opened an issue about it https://github.com/ROCm/clr/issues/72

It is likely the HIP/Vulkan interop functions require a special build of rocm-clr and AMDVLK drivers instead of RADV. I guess the next step will be attempting to get that working, followed by compiling a custom version of AMDVLK that supports the required NVX extensions.

Windows might also be a a better platform to target, as I bet the interop will work. But I have no idea if there is a way to inject extensions into the Vulkan/DX12 implementation.

opcod3 commented 4 months ago

I managed to get a vulkan layer 'working', at least I can get calls passed from the DLSS example to ZLUDA. I now got the PTX dumped.

I had to implement/stub the following instructions:

I am now missing the following:

@vosen My question now is about cvt with f16x2. What do you think is the best way to implement it, as it has 3 arguments instead of the usual two for cvt.

Is it better to create a new struct like this:

pub struct Arg3Cvt<P: ArgParams> {
    pub dst: P::Operand,
    pub src1: P::Operand,
    pub src2: option<P::Operand>,
}

and substitute it as argument for all versions of cvt. Or is it better to make a new cvtHalf instruction? Or is there a third better way, I'm still learning rust.

vosen commented 4 months ago

Best to decide depending on what sort of LLVM bitcode do we need to emit for it. as::CvtDetails already splits conversion case into separate variants (IntFromInt, FloatFromFloat, IntFromFloat, FloatFromInt) which are handled differently. One more distinct variant is going to fit ok with the rest of the code. On the other hand adding a distinct ast::Instruction variant just for cvt.rn.f16x2.f32 is fine too - there is no reason for other cvt variants to care about the src2 that will always be None and there's already precedence for this sort of codeling in the codebase (e.g. Atom vs AtomCas`).

Both solutions are good, choose whichever turns out to be more convenient.

kr1spy2115 commented 1 month ago

So after rollback is it still possible working on DLSS or is it a far goal to achieve now?

Weather-OS commented 1 month ago

So after rollback is it still possible working on DLSS or is it a far goal to achieve now?

It’s pretty much a pipe dream for the foreseeable future.