m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs
http://www.ilgpu.net
Other
1.34k stars 115 forks source link

[BUG]: What is the kernel compilation error? #1257

Open delverOne25 opened 1 month ago

delverOne25 commented 1 month ago

Describe the bug

  public static void testKernelSwap64(Index1D index, ArrayView<int> arr1){} // OK
  public static void testKernelSwap1024(Index1D index, ArrayView<int> arr1){     int[] arr = new int[1024];} // OK
  public static void testKernelSwap4096(Index1D index, ArrayView<int> arr1)
 {
     index *= 4096*8;
     if (index + 4096*8 >= arr1.Length)
         return;
     int[] arr = new int[4096*8];
    swap(index, arr1, arr, 4096 * 8); 
 }

 accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>>(kernelSwap4096); // error

   //   [MethodImpl(MethodImplOptions.NoInlining)]  Ok
   [MethodImpl(MethodImplOptions.AggressiveInlinings)]  // ERROR 
    static void swap(Index1D index, ArrayView<int> arr, int[] temp,int len )
    { 
        int len0 = index+len - 1;
        for (int i = 0; i < len; i++)
        {
            temp[i] = arr[index+i];
        }
        for (int i = 0; i < len; i++)
        { 
            arr[index+i] = arr[len0 - i];
            arr[len0 - i]= temp[i];
        }
    }

NVIDIA GeForce RTX 4060 [Type: Cuda, WarpSize: 32, MaxNumThreadsPerGroup: 1024, MemorySize: 8585216000] Unhandled exception. ILGPU.InternalCompilerException: An internal compiler error has been detected ---> System.Collections.Generic.KeyNotFoundException: The given key 'arith.bin.Shl_2540: index_168026, const_2539 [None]' was not present in the dictionary. at System.Collections.Generic.Dictionary2.get_Item(TKey key) at ILGPU.IR.Analyses.AnalysisValueMapping1.get_Item(Value key) at ILGPU.IR.Analyses.ValueFixPointAnalysis2.ValueAnalysisContext.get_Item(Value valueNode) at ILGPU.IR.Analyses.ValueFixPointAnalysis2.GenericValue[TContext](AnalysisValue1 source, Value value, TContext context) at ILGPU.IR.Analyses.ValueFixPointAnalysis2.Merge[TContext](AnalysisValue1& source, Value value, TContext context) at ILGPU.IR.Analyses.ValueFixPointAnalysis2.Update[TContext](Value node, TContext context) at ILGPU.IR.Analyses.ValueFixPointAnalysis2.<Analyze>g__ProcessBlock|9_0[TOrder,TBlockDirection](BasicBlock block, <>c__DisplayClass9_02&) at ILGPU.IR.Analyses.ValueFixPointAnalysis2.Analyze[TOrder,TBlockDirection](BasicBlockCollection2& blocks, AnalysisValueMapping1 valueMapping, AnalysisReturnValueMapping1 returnMapping)

Environment

m4rs-mt commented 1 month ago

Hi @delverOne25, thanks for your bug report.cd I'll take a look at it whether the problem still exists in 2.0beta1.

m4rs-mt commented 1 month ago

Regardless of the analysis results, I strongly advise against performing memory allocations on local arrays due to their significant negative impact on runtime performance. Instead, consider implementing a shared memory approach and involving multiple threads for swapping values through synchronization primitives such as locks or atomic operations. Another alternative is utilizing LocalMemory.Allocate to enforce proper thread-local memory management, which may help mitigate this compilation issue. However, it's important to note that this might not completely eliminate performance penalties.