kunzmi / managedCuda

ManagedCUDA aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.
Other
440 stars 79 forks source link

Specifying nested streams depth limit using cudaLimitDevRuntimeSyncDepth #81

Closed velaris1 closed 4 years ago

velaris1 commented 4 years ago

Hi,

I'm running a kernel that leverages nested kernel launches through streams. It seems that the current max depth supported by my device (1080 Ti) is 16, whereas the documentation specifies 24 as being the limit:

"The overall maximum nesting depth is limited to 24, but practically speaking the real limit will be the amount of memory required by the system for each new level (see Memory Footprint above). Any launch which would result in a kernel at a deeper level than the maximum will fail. (...)"

It goes on to say that:

" This maximum synchronization depth (and hence reserved storage) may be controlled by calling cudaDeviceSetLimit() and specifying cudaLimitDevRuntimeSyncDepth."

I'd like to try setting a new depth of 24, hoping that my limit is not memory-bound.

I could not find a way to configure this option through Managed Cuda. Did I miss it ?

Thanks

Edit: I was using an old version of ManagedCuda dating back to 2017 (ManagedCuda-80 on Nuget).

velaris1 commented 4 years ago

Closing the issue for now. It seems like I was using an old version of the library (ManagedCuda), upgrading to ManagedCuda 10.0 seems to have fixed it. I can now have a nested depth of 24.

kunzmi commented 4 years ago

CudaContext.SetLimit() is what you were looking for. On the other hand I don't see how a different managedCuda version influences that behaviour, given that all it does is calling the same native library. But as long as it works...

velaris1 commented 4 years ago

Thank you for the reply. I definitely should have checked the methods on the CudaContext, my bad.

Not sure either, I'll double check using the old version and confirm if that's really what's affecting the nesting. Regardless, thank you for your excellent work.

velaris1 commented 4 years ago

I'm stupid. I confused the nesting depth with the synchronization depth. It looks like the nesting depth is not set through API calls, only the synchronization depth is. My issue was related to nesting, not synchronization.

I should have paid more attention...:

"The synchronization depth is defined as the deepest level at which the program will explicitly synchronize on a child launch. Typically this is one less than the nesting depth of the program, but if the program does not need to call cudaDeviceSynchronize() at all levels then the synchronization depth might be substantially different to the nesting depth."

Maybe an update to the library somehow affected the nesting depth ?