m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs
http://www.ilgpu.net
Other
1.41k stars 120 forks source link

RTX 4090 runtime PTX JIT compilation failed #880

Closed yuryGotham closed 2 years ago

yuryGotham commented 2 years ago

I'm getting a runtime error when trying to run any kernels on 4090. This is for empty "return;" kernel - same code works fine on 3090. ILGPU memory allocation and ILGPU CuBlas GEMM worked just fine on 4090.

Error:

NVIDIA GeForce RTX 4090 [Type: Cuda, WarpSize: 32, MaxNumThreadsPerGroup: 1024, MemorySize: 25756696576]
11/4/2022 9:29:58 PM    Time spent to initialize GPU: 0.580s
Unhandled exception. ILGPU.Runtime.Cuda.CudaException: a PTX JIT compilation failed
   at ILGPU.Runtime.Cuda.CudaKernel..ctor(CudaAccelerator accelerator, PTXCompiledKernel kernel, MethodInfo launcher)
   at ILGPU.Runtime.Cuda.CudaAccelerator.CreateKernel(PTXCompiledKernel compiledKernel, MethodInfo launcher)
   at ILGPU.Runtime.KernelAccelerator`2.LoadKernelInternal(CompiledKernel kernel)
   at ILGPU.Runtime.Accelerator.LoadKernel(CompiledKernel kernel)
   at ILGPU.Runtime.Accelerator.DefaultKernelLoader.LoadKernel(Accelerator accelerator, CompiledKernel compiledKernel, KernelInfo& kernelInfo)
   at ILGPU.Runtime.Accelerator.<>c__DisplayClass157_0`1.<LoadGenericKernel>b__0(TKernelLoader& loader, KernelInfo& info)
   at ILGPU.Runtime.Accelerator.LoadKernel[TDelegate](MethodInfo method, KernelSpecialization specialization, KernelInfo& kernelInfo)
   at ILGPU.Runtime.KernelLoaders.LoadKernel[T1,T2,T3](Accelerator accelerator, Action`3 action, KernelInfo& kernelInfo)
   at ILGPU.Runtime.KernelLoaders.LoadKernel[T1,T2,T3](Accelerator accelerator, Action`3 action)
   at ILGPU.Runtime.Accelerator.GetOrLoadLauncher[TSource,TTarget,TLaunchLoader](TSource action)
   at Program.SetValuePlain(AcceleratorStream stream, ArrayView1D`2 vectorview, Int32 value) in C:\VS_tests\ConsoleApp1\ConsoleApp1\Program.cs:line 40
MoFtZ commented 2 years ago

hi @yuryGotham. my guess is that the 4090 is using a newer PTX instruction set and architecture. Looks like ILGPU might need to keep updating this list on each new SM architecture.

I've raised #881 to add the missing pieces.