kunzmi / managedCuda

ManagedCUDA aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.
Other
440 stars 79 forks source link

Need a way to call cudaDeviceSynchronize() after CudaFFTPlanMany.Exec(...) or Texture is corrupted #108

Open TheWhiteAmbit opened 2 years ago

TheWhiteAmbit commented 2 years ago

I am using FFT batch CudaFFTPlanMany.Exec(...) on CudaPitchedDeviceVariable<ManagedCuda.VectorTypes.float2> that is Mapped and copied to a texture then. Everything works great, except rarely I see a half processed FFT copied to the later texture. The native part suggests calling cudaDeviceSynchronize() after FFT batch and before proceeding.

I tried calling cudaContext.Synchronize() [also tried making the only context to current via cudaContext.SetCurrent() before] but the glitches still occur. Not sure if sync on context calls cudaDeviceSynchronize() or some other method. The only mention of cudaDeviceSynchronize was in the SetLimits(...) method of the managed CudaContext.

Can you please provide a mapping to somehow call cudaDeviceSynchronize() if that is not possible via cudaContext.Synchronize()

Great work by the way, thank you for making cuda so convenient!

kunzmi commented 2 years ago

Hi,

cudaContext.Synchronize() is the driver API equivalent of cudaDeviceSynchronize(), but you shouldn't actually need to synchronize at all. I guess you don't use streams here, so everything runs implicitly on the 0-stream.

So why isn't it synchronized? In Cuda 11.0 (or 11.1? I don't remember exactly...), Nvidia changed the behavior of Cuda libraries like cufft with respect to context sharing. You likely create a standard context (aka ManagedCuda.CudaContext) in your application and then do a call to cufft. Before Cuda 11, cufft would just take over the existing context and run normally. Now it seems that it sort of creates half a new context: sometimes it does work normally, sometimes it doesn't work at all, sometimes it gives strange issues like here. The fix on the other side is simple: use a primary context. Simply exchange your CudaContext by PrimaryContext and it should work!

Instead of

CudaContext ctx = new CudaContext(deviceID);

use

CudaContext ctx = new PrimaryContext(deviceId);
ctx.SetCurrent(); //Important: call this from your processing CPU-thread as primary contexts are not bound initially!

If this doesn't fix the synchronization issue, it still might be that CUFFT runs internally on multiple streams and that you need to explicitly synchronize, I'd have to check the documentation for that though. Anyhow, with the correct context now established, a call to ctx.Synchronize() now correctly synchronizes the entire GPU-work load.

(Primary context is the type of context used by the cuda runtime API and cuda libraries and was made available through driver API in some earlier cuda versions but not from the beginning on.)

TheWhiteAmbit commented 2 years ago

Hey,

thank you for the fast reply. You are correct I am not using streams here. I will try your suggestions and will report back if the texture is still visibly broken of if that fixes everything so other will know.

As far as I can say (managed) CuFFT runs on the context created by ManagedCuda without the need to explicitly being told about the context. But it will not run (as far as I remember it threw an exception) when no context was created by ManagedCuda before.

CU, so very helpful your project!

TheWhiteAmbit commented 2 years ago

Well, as far as I can tell by now your hint to use PrimaryContext seems to work. So this issue can be closed I guess and others can find it once running into that problem. No Sync needs to be used at all, that was just my assumption why it wasn't working as intended, because the original C++ samples use Sync after each FFT batch execution.