intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.1k stars 229 forks source link

CL/GL sharing support more texture formats #667

Closed smunaut closed 4 months ago

smunaut commented 11 months ago

The support implemented in #166 has some limitations on the supported texture format. I have been discussing in that (now closed) issue the possibility to support more formats but I thought it'd be better to open a new issue rather than keep talking in a closed one.

I've so far removed the quick check that limit the "create memory" to the currently supported texture format and what I noticed is that the call then hangs if you use unsupported textures.

I dug a bit into that and noticed two things :

smunaut commented 11 months ago

Ok, so digging even further :

kallaballa commented 11 months ago

Wow. great work! Today I'll jump in and follow your path. In general (afaik) Mesa/EGL is currently in heavy development. Maybe reaching out to Mesa might be a good idea.

kallaballa commented 11 months ago

I'll use GL_RGBA32F for starters with my code just to see if it breaks the same way. But we need a common example to have a chance. Do you have one?

smunaut commented 11 months ago

Proposed fix for the locking issue btw : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24487/diffs?commit_id=fe844bae481b4f2369ab78537d2a800817973389

As for example, I use the following code. It doesn't really do anything except just trying to create the texture / cl mem object and nothing else ATM.

main.c.txt

kallaballa commented 11 months ago

Thx!

kallaballa commented 11 months ago

I think i found the problem. __DRI_IMAGE_FORMAT_NONE collides with an actual format. commenting those 2 lines fixes that: https://github.com/Mesa3D/mesa/blob/957009978ef6d7121fc0d710d03bc20097d4d46b/src/gallium/frontends/dri/dri_helpers.c#L679-L680

Now the cl_mem is succesfully mapped.

kallaballa commented 11 months ago

Anyway that doesn't make a complete patch. I'll create an issue at mesa to maybe get an opinion from someone with the bigger picture.

smunaut commented 11 months ago

Not sure what you mean with __DRI_IMAGE_FORMAT_NONE colliding. I don't see any other image format with the same code.

Here's the flow I traced :

So I guess we'd need to introduce new DRI image formats as the first step.

smunaut commented 11 months ago

Apparently going through DRI images and the export dma buf api is the wrong way to do things for interop according to DRI devs I talked to ...

MesaGLInteropEGLExportObject seems to be the way to implement it AFAICT. It also seems to return a dmabuf but directly from a GL object handle without any intermediate EGLImage step.

kallaballa commented 11 months ago

Hmmm. interesting! Gonna be back at it in a couple of days

smunaut commented 11 months ago

Implemented a quick PoC replacing eglExportDMABUFImageMESA with MesaGLInteropEGLExportObject and it's doing something :sweat_smile:

And it almost looks right : https://i.imgur.com/SmtjQm2.jpg vs https://i.imgur.com/KPFMzAs.jpg

kallaballa commented 11 months ago

very excellent! sorry i can't help right now. sick in bed

smunaut commented 11 months ago

No worries, take care !

FWIW I pushed my current PoC https://github.com/smunaut/compute-runtime/tree/clgl

Note that it depends on some mesa patches https://gitlab.freedesktop.org/246tnt/mesa/-/commits/clgl-export

Those symbols are normally used internally. They can be accessed just using dlsym() looking for them at the global scope but not in all cases. If you have GLVND (vendor neutral dispatcher) and EGL, then that combination doesn't work. So after talking in the #dri-devel channel, the best option seemed to be to expose those function through the classic GetProcAddress mechanism and that's what those mesa patches do ...

smunaut commented 11 months ago

Found the bug above btw and it's unrelated to the change.

The extension is calling glGetTexLevelParameteriv to collect data about the texture. but that assumes that the texture is currently the one being bound in the GL state which is not necessarily the case ...

kallaballa commented 11 months ago

No worries, take care !

FWIW I pushed my current PoC https://github.com/smunaut/compute-runtime/tree/clgl

That looks really promising!

Note that it depends on some mesa patches https://gitlab.freedesktop.org/246tnt/mesa/-/commits/clgl-export

Those symbols are normally used internally. They can be accessed just using dlsym() looking for them at the global scope but not in all cases. If you have GLVND (vendor neutral dispatcher) and EGL, then that combination doesn't work. So after talking in the #dri-devel channel, the best option seemed to be to expose those function through the classic GetProcAddress mechanism and that's what those mesa patches do ...

kallaballa commented 11 months ago

Found the bug above btw and it's unrelated to the change.

The extension is calling glGetTexLevelParameteriv to collect data about the texture. but that assumes that the texture is currently the one being bound in the GL state which is not necessarily the case ...

I have trouble following that last part. Could you point out where? I'd like to fix that if you haven't already.

kallaballa commented 11 months ago

Note that it depends on some mesa patches https://gitlab.freedesktop.org/246tnt/mesa/-/commits/clgl-export

Any idea when this is planned to be merged? I am just asking because the branch has a over 600 commits :)

smunaut commented 11 months ago

The issue is there : https://github.com/intel/compute-runtime/blob/master/opencl/source/sharings/gl/linux/gl_texture_linux.cpp#L57

This can't really call any GL functions because it should be independent of the current GL bind state and modifying it (like binding the texture, querying, restoring) is also very dodgy (I think GL could be doing other stuff in other threads).

The internal format is not an issue, it's returned by the interop API so we can get it from there, but I haven't found any way to get the texture size ...

I've been comparing this code with the windows side of the code, and in that one, the allocation is not grabbed the same way and doesn't seem to need the size. I've been trying to make the linux side more like the windows one but without success, and it might come from the different implementation of the underlying memory manager. I'd love an intel engineer to chime in and explain why the linux side was written like that in the first place. the 'reusesharedhandle' param is also different vs the win side and I'm not sure why either. Same with the "closeSharedHandle" which don't see to be on the win side either.

I've been looking at ROCm a bit too and in there the amd mesa driver actually returns some extra info through the gl interop call AFAICT (there is provision in that interop method to return "vendor" specific data), so we could also have the iris driver return more info from the gl side directly.

The mesa branch only has 3 commits (the 3 latest). It might appear like it has more because it's based off the 23.1.5 tag since I wanted to install system-wide locally and my system has 23.1.5 installed.

I'm not sure when this can be merged or even if this will be merged in that state, the functions might end up with other names. I just opened a draft MR asking for feedback : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24573

kallaballa commented 11 months ago

maybe @JacekDanecki could give us a little nodge in the right direction? :)

smunaut commented 11 months ago

Ok, I pushed a new version of my branch that doesn't have those issues anymore. I also did a lot of cleanup removing dead/buggy code that wouldn't work on linux anyway pending the reimplementation for linux ( things for GL_BUFFER sharing and synchronization primitives).

This also makes things a whole lot less dependent on EGL which is nice to make it compatible with GLX.

You need my mesa branch to run it obviously.

kallaballa commented 10 months ago

You need my mesa branch to run it obviously.

It has been merged?

smunaut commented 10 months ago

Most of it yes, but the v2 of the interop interface hasn't been merged yet. It's not a patch I wrote, it's part of another MR that's being reviewed now, there was actually quite a bit of review/ack going on yesterday so I have good hope that this will me merged very soon.

In the mean time, you still need to use my 23.2-resolve branch ( https://gitlab.freedesktop.org/246tnt/mesa/-/tree/23.2-resolve?ref_type=heads ).

kallaballa commented 10 months ago

Thx for the info and your awesome work :)