m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs
http://www.ilgpu.net
Other
1.38k stars 117 forks source link

[QUESTION]: <title>How can I restore the program after a cuda kernel error? #1268

Open delverOne25 opened 2 months ago

delverOne25 commented 2 months ago

Question

private static void test(Index1D d, ArrayView view) { view[-d ] = 3/d; /// error } public static unsafe void Main(string[] args) { var ctx = Context.Create (c=>c.AllAccelerators().EnableAlgorithms().Optimize(OptimizationLevel.O2).Inlining(InliningMode.Aggressive)); var a = ctx.CreateCudaAccelerator(0); var ttt = a.LoadAutoGroupedKernel<Index1D, ArrayView>(test); var ccc = a.Allocate1D(10); try { ttt(a.DefaultStream, 1000, ccc.View); a.DefaultStream.Synchronize(); } catch (AcceleratorException e) { CudaAPI.CurrentAPI.DestroyContext((a as CudaAccelerator).NativePtr); a = ctx.CreateCudaAccelerator(0); /// ILGPU.Runtime.Cuda.CudaException: "an illegal memory access was encountered"

        a.Dispose();
        return;
    }

Environment

Additional context

No response

hez2010 commented 2 weeks ago

I don't think it's possible. To be short, cuda device doesn't support any form of exception handling. If an error happened on the device side, there's no way to recover from it. The only solution is to avoid the error from happening before you execute it on the device.