feature/ocl: Improve Linux CL/GL sharing support

smunaut commented 10 months ago

This PR is aimed at drastically improving the support for the CL/GL sharing extension on linux. The current support is not really usable as it only supports a few texture format, and only on EGL contexts. It is also pretty buggy since it requires the texture to be bound when placing the CL call to share it which is just plain wrong and will not work in many applications. This new version makes used of the "official" interop extension from MESA which is available for GLX and EGL contexts, allows sharing of buffers and not just texture and supports many more formats. This is still far from being a fully compliant / full featured version of the extension, but it's a big step forward in my opinion and allows to run some real applications. I've tested gr-fosphor (SDR spectrum display) and Davinci Resolve as examples. Both of theses don't work without theses improvements.

Resolves: #659 Resolves: #667

Signed-off-by: Sylvain Munaut tnt@246tNt.com

k1gen commented 10 months ago

I have tested this on my machine, and the app I wanted to work (DaVinci Resolve) - worked flawlessly on UHD620!

smunaut commented 10 months ago

So any chance someone from intel can review / comment or just general advise on what is required to get this merged ?

smunaut commented 10 months ago

Thanks for the review, I'll try to address these next week.

smunaut commented 9 months ago

@JablonskiMateusz Ok, so it took me a while longer to finally get to it, but I have addressed the comments above, rebased over master (and also tested only the 23.30 release) and seems to all be fine.

smunaut commented 9 months ago

@JablonskiMateusz Responding here for the libGL thing because for some reason, github doesn't me put a reply to that specific comment :shrug:

(1) Because for whatever reason the user is free to have loaded (through whatever mechanism) any GL library and possibly not the first one that would be returned by a normal libdl lookup of libGL.so.0. Then there is also the EGL/GLX thing where we have no idea a-priori which one the user will want to use and we don't want to force loading the wrong one in the same address space, so it's easier to lookup what the user chose by looking at already available symbols.

(3) In the initGLFunctions it's not an issue because this is called when creating the CL context but at that point the user must have given a reference to the current GL context in the parameter list which means that GL must have been loaded already.

In the isEnabled case (to decide wether we show the extension or not), it's less clear. All the functionality of the extensions depend on a GL context being present first so to me it make sense to advertise it only if GL is loaded. But we could just as well just return true all the time and leave it up to the user to actually make sure it gets a valid GL context in time before trying to actually make use of it.

JablonskiMateusz commented 9 months ago

@JablonskiMateusz Responding here for the libGL thing because for some reason, github doesn't me put a reply to that specific comment 🤷

(1) Because for whatever reason the user is free to have loaded (through whatever mechanism) any GL library and possibly not the first one that would be returned by a normal libdl lookup of libGL.so.0. Then there is also the EGL/GLX thing where we have no idea a-priori which one the user will want to use and we don't want to force loading the wrong one in the same address space, so it's easier to lookup what the user chose by looking at already available symbols.

(3) In the initGLFunctions it's not an issue because this is called when creating the CL context but at that point the user must have given a reference to the current GL context in the parameter list which means that GL must have been loaded already.

In the isEnabled case (to decide wether we show the extension or not), it's less clear. All the functionality of the extensions depend on a GL context being present first so to me it make sense to advertise it only if GL is loaded. But we could just as well just return true all the time and leave it up to the user to actually make sure it gets a valid GL context in time before trying to actually make use of it.

We cannot assume that GL is already loaded. Where does that requirement come from? We need to try to open gl library in order to expose the extension - that's our baseline. If GL library was already loaded then we will just increase ref count instead of loading or reloading it. It will be used in the same form as was already loaded.

smunaut commented 9 months ago

But we don't know which GL library ... it's not our choice.

It's be 100% valid for a user to do dlopen("/path/to/some/custom/libGL.so.1234", RTLD_GLOBAL); to load a GL library that's not the system default one ... So we can't hard code any names or path anywhere.

If you don't want to assume GL is already loaded to return the extension, then I would return it all the time. Which from reading the extension spec is perfectly valid since all that's needed is that there is some way/combination of GL context/device for which we could support it.

eero-t commented 9 months ago

Btw. @smunaut have you looked (e.g. latrace'd) what Nvidia and AMD OpenCL drivers do?

smunaut commented 9 months ago

@eero-t

AMD looksup symbols in the global scope without loading anything it self, you can see it here :

https://github.com/ROCm-Developer-Tools/clr/blob/5914ac3c6e9b3848023a7fa25e19e560b1c38541/rocclr/device/rocm/rocglinterop.cpp#L71C38-L71C69

(Now, here they lookup MesaGLInteropGLXExportObject directly which is "the old way" of doing it because until recently that was the only way since glxGetProcAddress wouldn't support that symbol but it was shown that this direct method didn't work when using EGL + GLVND so mesa added those interop symbols to the normal GetProcAddress ... but it's the same idea, they search in the global scope what's already there and don't load anything themselves).

The AMD implementation always advertises the extension, it's statically present in the list of supported extension and it doesn't do any checks before it returns it AFAICT. ( https://github.com/ROCm-Developer-Tools/clr/blob/5914ac3c6e9b3848023a7fa25e19e560b1c38541/rocclr/device/device.hpp#L204 )

In Mesa's rusticl they also just search the global scope : https://gitlab.freedesktop.org/mesa/mesa/-/commit/02f7d3640021f23d54022c6fbdf1304c44672033#cb331a47f57adfd7e2565e461a25af9ec00ff8e6_0_32 They call dlsym with a NULL first argument to just lookup in what's already loaded rather than any form of explicit linking. However contrary to AMD's they do not return the extension if GL is not loaded already.

As for NVIDIA, I'm not sure, no sources I can look at ... I'd need to try it.

JablonskiMateusz commented 8 months ago

I think we may self load in this case. Extension can be exposed always but to work with sharing GL library needs to be loaded.

smunaut commented 7 months ago

Rebased to master
Removed the isGLSharingEnabled check and always advertise extension
Adapted to the final version of the mesa interop interface that is now part of git master on the mesa repo.

JablonskiMateusz commented 7 months ago

@smunaut please adjust commit msg to https://github.com/intel/compute-runtime/blob/master/CONTRIBUTING.md#commit-message-guidelines I think squash to single commit would be needed

smunaut commented 7 months ago

Is squashing really needed ? It makes the diff really hard to read and also means I can't put explanation of why a particular change was done that way in the commit message since it's all globbed up into a giant unreadable diff.

All the commits would belong in feature/ocl AFAICT.

JablonskiMateusz commented 7 months ago

it will be squashed in the end but we can do this on our side. Let's keep it here without squashing. However, I got some errors when applying the PR:

../../../opencl/source/sharings/gl/linux/gl_buffer_linux.cpp:225:51: error: no matching constructor for initialization of 'NEO::Gmm' graphicsAllocation->setDefaultGmm(new Gmm(helper, ^ ~~~ ../../../shared/source/gmm_helper/gmm.h:45:5: note: candidate constructor not viable: requires 7 arguments, but 8 were provided Gmm(GmmHelper gmmHelper, const void alignedPtr, size_t alignedSize, size_t alignment, ^ ../../../shared/source/gmm_helper/gmm.h:44:5: note: candidate constructor not viable: requires 4 arguments, but 8 were provided Gmm(GmmHelper gmmHelper, ImageInfo &inputOutputImgInfo, const StorageInfo &storageInfo, bool preferCompressed); ^ ../../../shared/source/gmm_helper/gmm.h:48:5: note: candidate constructor not viable: requires 3 arguments, but 8 were provided Gmm(GmmHelper gmmHelper, GMM_RESOURCE_INFO inputGmm, bool openingHandle); ^ ../../../shared/source/gmm_helper/gmm.h:47:5: note: candidate constructor not viable: requires 2 arguments, but 8 were provided Gmm(GmmHelper gmmHelper, GMM_RESOURCE_INFO *inputGmm); ^ ../../../shared/source/gmm_helper/gmm.h:40:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 8 were provided class Gmm { ^ ../../../shared/source/gmm_helper/gmm.h:43:5: note: candidate constructor not viable: requires 0 arguments, but 8 were provided Gmm() = delete; ^ 1 error generated.

Also, when fixed the issue locally I got test failures:

97x TEST_F(GlSharingTextureTests, givenMockGlWhen2dGlTextureIsCreatedThenMemObjectHasGlHandler) { 98x cl_int retVal = CL_INVALID_VALUE; 99x 100x glSharing->uploadDataToTextureInfo(textureId); 101x 102t> auto glTexture = GlTexture::createSharedGlTexture(clContext.get(), (cl_mem_flags)0, GL_TEXTURE_2D, 0, textureId, &retVal); 103x ASSERT_NE(nullptr, glTexture);

46x texIn.version = 2; 47x texIn.target = getBaseTargetType(target); 48x texIn.obj = texture; 49x texIn.miplevel = miplevel; 50x 51x switch (flags) { 52x case CL_MEM_READ_ONLY: 53x texIn.access = MESA_GLINTEROP_ACCESS_READ_ONLY; 54x break; 55x case CL_MEM_WRITE_ONLY: 56x texIn.access = MESA_GLINTEROP_ACCESS_WRITE_ONLY; 57x break; 58x case CL_MEM_READ_WRITE: 59x texIn.access = MESA_GLINTEROP_ACCESS_READ_WRITE; 60x break; 61x default: 62t> errorCode.set(CL_INVALID_VALUE); 63x return nullptr; 64x }

Could you please ensure that this code is working after rebase?

smunaut commented 5 months ago

I'm doing what I can to test but again, I'm trying to rebase to master and build here but I just can't ... master just doesn't build and it ends with ocloc segfaulting currently, so there isn't much testing I can do.

I can build 23.48.27912.11 so currently that's the most recent thing I could rebase and test onto. Both 23.52.28202.13 and master don't build for me.

smunaut commented 5 months ago

Ok, I managed to build master on my machine (seems there is a requirement for some specific version of dependencies that are not checked by the build system).

So I rebased and checked it build. I added a couple of small fixes and removed non-applicable broken tests.

Running make run_adlp_0_ocl_tests passes without error. I can't run all the tests because some of them fail on my machine even without this patch and the testing stops at the first failure ... But this specific test target should check everything that is affected by this patch set.

I didn't change the commit message since if you squash on your side you'll have to make a new one anyway so at that point you can use the feature/ocl prefix for it at that time.

eero-t commented 5 months ago

What were the specific dependency versions that build system did not check?

(I normally take latest tagged releases of GMMlib, SPIRV-*, vc-intrinsics, IGC, L0, compute-runtime, and build them with Ubuntu using system LLVM 14 packages: clang, libopencl-clang, llvm-spirv, libllvmspirvlib, and haven't had any problems. As I'm just driver user, not developer, I haven't tried building master though...)

smunaut commented 5 months ago

I was trying to build both master and 23.52.28202.13 while having :

intel-graphics-compiler-1.0.15368.3
intel-vc-intrinsics-0.13.0

And cmake accepted those fine and it started building but it ended with ocloc segfaulting.

After I upgraded to :

intel-graphics-compiler-1.0.15770.7
dev-libs/intel-vc-intrinsics-0.16.0

then it worked and I was able to build.

smunaut commented 4 months ago

Just rebased on top of master again. Checked that it builds, works in a coupld of applications, and that make run_adlp_0_ocl_tests works too.

smunaut commented 4 months ago

:wave:

JablonskiMateusz commented 4 months ago

Hi @smunaut I'll try merge it but I need additional input from your side. Please update PR title and PR description to meet our commit msg restrictions https://github.com/intel/compute-runtime/blob/master/CONTRIBUTING.md#commit-message-guidelines

smunaut commented 4 months ago

@JablonskiMateusz I've updated the PR title and description to match those guidelines. I didn't update the commits themselves since AFAIR you said you would squash them all together and I assume the final commit title/description will be taken from the PR.

smunaut commented 4 months ago

Merged outside of github.

Thanks @JablonskiMateusz

intel / compute-runtime

feature/ocl: Improve Linux CL/GL sharing support #673