Implement compatibility for external LLVM-IL

AlexDemydenko commented 5 months ago

The compiler has an API to compile LLVM-IL code directly. The idea of clspv is to use the semi-compiled sources (from cl to IL) for future compilation with other parameters or to compile with a different combination of the semi-compiled sources. In this case all IL-sources are generated by the internal (LLVM) clspv FrontEnd.

The clvk has mode CLVK_ENABLE_SPIRV_IL. In this case the driver prepares the IL using SPIRV-LLVM-Translator.

The problem with "use-native-builtins". The SPIRV-LLVM-Translator doesn't use this option and generate LLVM-IL using it native instructions instead of to using the built-ins from clspv--.bc.

The proposal is to implement a new Pass that should intercept instructions that not in the list of "use-native-builtins" and replace them with the one of built-ins from clspv--.bc.

This pass could implemented for all external IL or directly to support the SPIRV-LLVM-Translator only.

AlexDemydenko commented 5 months ago

This is a related PR from which it all began, just for the record. https://github.com/google/clspv/pull/1249

rjodinchr commented 5 months ago

I think this would have to be done before linking with the builtins library as we link in lazy mode to be efficient (both time-wise and size-wise).

We would need to parse each instruction of the module in CompileModule, before LinkBuiltinLibrary to replace the instruction that should be mapped to an emulated builtin call.

Then NativeMathPass would take care of using the native or emulated implementation depending on what is requested by the driver.

alan-baker commented 5 months ago

Can you provide an example of a translation from the spirv-llvm-translator you wish to replace? Would the replacement be directly using a SPIR-V instruction (or extended instruction) or an emulation? If it is an emulation, would it be different than libclc?

AlexDemydenko commented 5 months ago

All files in frem.zip

There 2 compilations: 1) frem_float.cl to CL_frem_emulated.spv 2) source_frem_float.spv to result_frem_float.spv The results in CL_frem_emulated.spv & result_frem_float.spv should be equal.

FYI. fmod function is equal to OpFRem instuction in spirv and the OpFMod instruction is similar but not equal.

In this case, I want to replace the OpFMod instruction with the fmod function and take the body of this instruction from libclc.

alan-baker commented 5 months ago

If I understand correctly if you compile directly from the spirv-llvm-translator you end up OpFMod in the SPIR-V because the translator maps directly to the llvm instruction and clspv maps that to the GLSL std450 instruction. Instead you want to replace fmod with the libclc implementation which we have as the mangled _Z4fmodff function.

If that's correct then it sounds like almost like a reverse of the native math pass. Specify a list of functions you wish to emulate using the libclc implementation. That seems reasonable if the source is a single instruction. It's probably a little more complicated if you need to replace a sequence of instructions, but maybe the pattern matcher LLVM uses in InstCombine could be used for such cases.

I could image an early pass in the flow where that does this before we discard the unused libclc functions. Is it the case that the libclc functions are all you need? That you don't require a different emulation than what clspv normally uses?

rjodinchr commented 5 months ago

Native math pass do not discard that much the unused libclc functios anymore. Since we link in lazy the module, we only get what we need from there. So this will have to be a pass done before linking with the builtin library.

alan-baker commented 5 months ago

Native math pass do not discard that much the unused libclc functios anymore. Since we link in lazy the module, we only get what we need from there. So this will have to be a pass done before linking with the builtin library.

That's unfortunate and not ideal as we don't currently run any passes prior to linking.

AlexDemydenko commented 5 months ago

The OpFMod instruction is only an example. Maybe it is a bad example. I have deal with a fork of the clspv witch is somewhat outdated. But the idea is the same for all other functions. For example, the spirv_new test from the OpenCL-CTS test suite contains tests for: OpFAdd, OpFSub, OpFMul, OpFDiv, OpFMod, OpFRem. And this is only the math section. The test has other test cases.

AlexDemydenko commented 5 months ago

The test list you can find here: test_conformance/spirv/math_brute_force.zip test_conformance/spirv_new/spirv_asm/

rjodinchr commented 5 months ago

Maybe it would make sense to implement the translation directly in the LLVM-SPIRV-TRANSLATOR? I could understand that we are not the only ones who prefer to see this OpFMod mapped to the libclc implementation of fmod and not directly to the LLVM IR operator. At least for OpenCL SPIR-V.

alan-baker commented 5 months ago

The OpFMod instruction is only an example. Maybe it is a bad example. I have deal with a fork of the clspv witch is somewhat outdated. But the idea is the same for all other functions. For example, the spirv_new test from the OpenCL-CTS test suite contains tests for: OpFAdd, OpFSub, OpFMul, OpFDiv, OpFMod, OpFRem. And this is only the math section. The test has other test cases.

I would suggest documenting what the various translations need to be then. You can also compare Vulkan's accuracy to OpenCL's accuracy. I wouldn't expect + or - to need special translations. In any event, before finalizing any design it would be good to know the full extent of changes that are necessary.

AlexDemydenko commented 5 months ago

Maybe it would make sense to implement the translation directly in the LLVM-SPIRV-TRANSLATOR? I could understand that we are not the only ones who prefer to see this OpFMod mapped to the libclc implementation of fmod and not directly to the LLVM IR operator. At least for OpenCL SPIR-V.

The same instructions have different precision requirements based on different memory models (Vulka/OpenCL). Take a look: Vulkan SPIRV Precision & OpenCL SPIRVPrecision. LLVM-SPIRV-TRANSLATOR is only responsible for OpenCL kernels (based on the description from the project). Why it should handle cases when the driver for some reasons uses vulkan versions of the instructions instead of OpenCL versions in the OpenCL kernel?

Even if it does. Where is the guaranty what they would use a solution identical to the solution in clspv? The problem is that results are not identical. CTS tests don't check that one of the results is not good enough. Both versions can produce results with valid precision, but if the are not identical, then the test will fails.

AlexDemydenko commented 5 months ago

The OpFMod instruction is only an example. Maybe it is a bad example. I have deal with a fork of the clspv witch is somewhat outdated. But the idea is the same for all other functions. For example, the spirv_new test from the OpenCL-CTS test suite contains tests for: OpFAdd, OpFSub, OpFMul, OpFDiv, OpFMod, OpFRem. And this is only the math section. The test has other test cases.

I would suggest documenting what the various translations need to be then. You can also compare Vulkan's accuracy to OpenCL's accuracy. I wouldn't expect + or - to need special translations. In any event, before finalizing any design it would be good to know the full extent of changes that are necessary.

The idea is to support only builtin functions from libclc. I think this library with math functions is there to solve the same problem. I don't want to do more delta between spirv and Opencl-C sources. I want only reduce it. In this case, I only need to use what exists for the Opencl-C sources.

rjodinchr commented 5 months ago

Maybe it would make sense to implement the translation directly in the LLVM-SPIRV-TRANSLATOR? I could understand that we are not the only ones who prefer to see this OpFMod mapped to the libclc implementation of fmod and not directly to the LLVM IR operator. At least for OpenCL SPIR-V.

The same instructions have different precision requirements based on different memory models (Vulka/OpenCL). Take a look: Vulkan SPIRV Precision & OpenCL SPIRVPrecision. LLVM-SPIRV-TRANSLATOR is only responsible for OpenCL kernels (based on the description from the project). Why it should handle cases when the driver for some reasons uses vulkan versions of the instructions instead of OpenCL versions in the OpenCL kernel?

Even if it does. Where is the guaranty what they would use a solution identical to the solution in clspv? The problem is that results are not identical. CTS tests don't check that one of the results is not good enough. Both versions can produce results with valid precision, but if the are not identical, then the test will fails.

My thought here is that when the LLVM-SPIRV-TRANSLATOR maps OpFMod to the LLVM IR fmod, we kind of lose the precision requirement. To compare it with clang, clang would not generate the IR fmod from a fmod call in an OpenCL-C kernel, because clang knows that precision wise it's not enough for OpenCL.

But this is just an idea. It might be easier architecture-wise to do that in the LLVM-SPIRV-TRANSLATOR, but this is just a guess, an idea to consider. Maybe it's much harder to do it in LLVM-SPIRV-TRANSLATOR and we should focus on clspv. I'm just saying we should consider it.

AlexDemydenko commented 5 months ago

What I found about the LLVM IR fmod/frem instruction. https://llvm.org/docs/LangRef.html#frem-instruction

My opinion is this: if the math instruction doesn't have any fast-math flags then it is the OpenCL version. if the math instruction labeled fast flag then a function from the graphics pipeline.

I will do an issue for LLVM-SPIRV-TRANSLATOR in parallels.

google / clspv

Implement compatibility for external LLVM-IL #1325