Expose new CUDA APIs for sharing the same memory between processes

Deneas commented 3 months ago

Hello This Pull request is a result of my experimentations with CUDA IPC and will expose the CUDA IPC API to allow sharing the same memory between different processes. Another part would be the Event API, but as far as I can tell, you already habe that one exposed.

The biggest currently open task is memory management of the buffer allocated by IPC. My current plan is creating a class CudaIpcMemoryBuffer mirroring CudaMemoryBuffer. The main difference I currently found was in Disposing, as the IPC buffer should then use the new method CloseIpcMemHandle.

Also the way I see it, the actual exchange of the IPC handle between different processes should be out of scope for this project.

Feel free to give feedback and to ask questions.

Deneas commented 3 months ago

Sorry for the switching around of the draft status, I'm still getting used to this. As for the pull request: The actual functionality for exposing and using memory through the IPC is already done. What remains is how we expose it for end use. For now I created a method MapRaw on the CudaAccelerator based on the similar AllocateRaw method. This method is not strictly necessary, instead users could just create the CudaIpcMemoryBuffer by itself. Lastly, I'm currently exploringhow to write the tests for this new feature. Unfortunately the IPC handles explicitly do not work in the same process, so for testing we would actually need two processes communication with each other. An easy solution would be a separate new console project that then is started by the tests, but I'm looking whether there are some more "elegant" solutions possible.

Deneas commented 3 months ago

In addition, this PR handles consuming an IPC buffer from another process, but not the "Get IPC Mem Handle" functionality. That would mean ILGPU could not share it's buffer with another process. Is that correct?

No, I just missed creating a convenient method for doing so in CudaAccelerator. The fuctionality is mapped in the CudaAPI and available as GetIpcMemoryHandle.
I'm currently debating whether to call the convenience method simply GetIpcMemoryHandle or ExportMemoryForOtherProcesses, what sounds better for you?

With regards to unit testing, I'm not sure if that is even possible. Perhaps adding two sample projects, for the Producer and Consumer?

Yeah, while I think it would be doable as test with a new project, adding a new Sample seems the more idomatic solution for ILGPU. I saw that CUDA has an example called SimpleIPC. I'll take that as a starting point and see where it takes me. If I managed to make it work between .Net and Python, then between two .Net processes shouldn't be difficult 😄

Deneas commented 3 months ago

Alright, with the added sample and changes, this PR is feature-complete. All that remains is to settle naming and the ergonomics of the helper methods.

A quick summary of my new work:

I added the sample CudaIPC with two projects showing both exporting memory and mapping said memory.
I added GetIpcEventHandle and OpenIpcEventHandle for completeness (note: I intentionally did not add any convenience methods, as regular events didn't seem to have the either).
I removed the usage of InlineArray for the IPC handles and switched to using raw arrays with byte*, because else users might forced to use C# 12.
I extended CudaDevice with HasIpcSupport since theres is a corresponding device attribute we can query.
I added the convenience methods GetIpcMemoryHandle and GetIpcMemoryHandle<TView> for MemoryBuffer and MemoryBuffer<TView> respectively.
I renamed MapRaw to MapFromIpcMemHandle and exposed the flags as mandatory parameter.

As before, feel free to ask questions and give feedback.

m4rs-mt commented 1 month ago

Amazing work, thank you very much for your efforts. I ping @MoFtZ to take a look to get this over the line.

m4rs-mt / ILGPU

Expose new CUDA APIs for sharing the same memory between processes #1235