Add a few functions for Cuda memory pool callings

zhongkaifu commented 9 months ago

Hi @kunzmi ,

My project https://github.com/zhongkaifu/Seq2SeqSharp depends on ManagedCuda. When I try to allocate memory (and free memory) from GPU memory pool, I didn't find proper functions, so there two functions are added.

Thanks Zhongkai Fu

kunzmi commented 9 months ago

Hi, thanks for your contribution!

I was checking your diffs and if I get you right you are basically missing a call to cuMemFreeAsync, right? ManagedCuda tries to follow the IDisposable scheme for all unmanaged resources, as are device memory allocations. To free memory of a CudaDeviceVariable<T> you'd need to call the Dispose() method.

With the async variants of alloc and free this now got a bit ugly, as this may not interfere with .net's garbage collector. So my solution to this was that a CudaDeviceVariable<T> allocated with cuMemAllocFromPoolAsync is not set as being the owner of the unmanaged resource. Garbage collector will thus never free the memory! In order to free the memory properly I still stuck to the Dispose() scheme and overloaded the method with Dispose(CudaStream stream) which does then call cuMemFreeAsync despite the CudaDeviceVariable not being flagged as owner. (The normal Dispose() calls are NOPs in case _isOwner==false).

So, if I got you right, you actually just need to call CudaDeviceVariable<T>.Dispose(CudaStream stream). Or did I get you wrong?

Cheers, Michael

zhongkaifu commented 9 months ago

Hi @kunzmi

Thanks for these details. The reason why I add this code is because my project uses CUdeviceptr rather than CudaDeviceVariable, and I manage objects life cycle and memory recycle by myself. Here is the related code: https://github.com/zhongkaifu/Seq2SeqSharp/blob/master/TensorSharp.CUDA/ContextState/CudaMemoryPoolDeviceAllocator.cs

However, I didn't find existing methods using CUdeviceptr as input and return variables for memory pool in ManagedCuda, so I implemented it. If ManagedCuda already has such methods, please let me know. :)

Thanks Zhongkai Fu

kunzmi commented 9 months ago

Ah, OK, now I understand your need. And you're right, the current implementation does only provide access via CudaDeviceVariable<T> and not CUdeviceptr. I'll merge it as it doesn't break anything.

I only have to ask you if you are OK with the dual-license that I use now for ManagedCuda, meaning that you accept that I use the modifications also in the commercial license?

Cheers, Michael

zhongkaifu commented 9 months ago

Thanks @kunzmi .

I'm totally okay with the dual-license. My project is based on BSD-3 clause license. Is there anything I need to do in my side for dual-license, such as updating License file or others?

Thanks Zhongkai Fu

kunzmi commented 9 months ago

Hey,

I wrote you a mail for the license ;)

Michael

zhongkaifu commented 9 months ago

Hey,

I wrote you a mail for the license ;)

Michael

Hi @kunzmi

Did you already send out the email to me (fuzhongkai@gmail.com) ? I have not received it yet...

Thanks Zhongkai Fu

kunzmi commented 9 months ago

Hi @zhongkaifu

yes, I sent the mail at the time when posting... Did you get it by now?

zhongkaifu commented 9 months ago

Ah, I got your email just now. I will reply to you later. :) Have a great day!

Thanks Zhongkai Fu

kunzmi / managedCuda

Add a few functions for Cuda memory pool callings #123