m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs
http://www.ilgpu.net
Other
1.38k stars 117 forks source link

Passing Int128 as kernel parameter is not working #1145

Closed MoFtZ closed 6 months ago

MoFtZ commented 9 months ago

I expected the new Int128 data type to just work, even if not necessarily performant. However, I have found an unexpected issue.

The following kernel does not work on Cuda, and generates the wrong output:

class Program
{
    static void MyKernel(Index1D index, ArrayView<Int128> dataView, Int128 constant)
    {
        dataView[index] = index.X + constant;
    }

    static void Main()
    {
        using var context = Context.CreateDefault();
        foreach (var device in context)
        {
            using var accelerator = device.CreateAccelerator(context);
            var kernel = accelerator.LoadAutoGroupedStreamKernel<
                Index1D, ArrayView<Int128>, Int128>(MyKernel);

            using var buffer = accelerator.Allocate1D<Int128>(1024);
            kernel((int)buffer.Length, buffer.View, 42);

            var data = buffer.GetAsArray1D();
            for (int i = 0, e = data.Length; i < e; ++i)
            {
                if (data[i] != 42 + i)
                    Console.WriteLine($"Error at element location {i}: {data[i]} found");
            }
        }
    }
}

Expected Output:

data[0] = { Lower = 42, Upper = 0 }
data[1] = { Lower = 43, Upper = 0 }
data[2] = { Lower = 44, Upper = 0 }
etc

Actual Output on Cuda:

data[0] = { Lower = 0, Upper = 42 }
data[1] = { Lower = 1, Upper = 42 }
data[2] = { Lower = 2, Upper = 42 }
etc
m4rs-mt commented 9 months ago

@MoFtZ thanks for reporting this. A quick investigation revealed that it looks like that 64bit additions with carry are not mapped properly to the data structure.

MoFtZ commented 8 months ago

@m4rs-mt It currently looks like an issue with the kernel launcher marshaling of the Int128 parameter. When performing Int128 operations within the kernel, it appears to work as expected. However, when using the supplied kernel parameter, the issue appears.

MoFtZ commented 8 months ago

@m4rs-mt OK, so I have confirmed that this is definitely to do with the kernel parameter marshaling.

ILGPU is not taking into account any structure padding/alignment.