Closed MoFtZ closed 5 months ago
@m4rs-mt just a thought... this works for the top-level structure. what about the nested structures? I dont think it will work there.
@m4rs-mt OK, this is definitely a problem.
Placing Int128
inside another struct, and passing MyStruct
as a kernel parameter will also cause alignment/padding issues.
public struct MyStruct
{
public byte X;
public Int128 Y;
}
Y
is aligned at Offset 16 in .NET, but Cuda expects it at Offset 8.
@m4rs-mt found another issue - if the kernel parameter is a structure containing a nested structure, ILGPU will flatten all the fields. This process of flattening can potentially change the byte offset of fields, which means that we can no longer use the .NET representation of the variable when using fixed
.
Fixes #1145.
When launching a Cuda kernel, ILGPU will create a runtime struct that represents all the kernel parameters. These kernel parameters are then passed to
cuLaunchKernel
as a single buffer. It is up to the caller ofcuLaunchKernel
(i.e. ILGPU) to make sure the alignment/padding of this buffer is correct.This runtime struct is created using the alignment/padding rules of the .NET runtime. Using the example of:
Kernel(Index1D index, ArrayView1D<Int128, Stride1D.Dense> output, Int128 constant)
.NET would create a struct as:
The additional padding at Offset 24 does not match the alignment rules of Cuda - it is expecting to find the Int128 at Offset 24.
Changing the order:
Kernel(Index1D index, Int128 constant, ArrayView1D<Int128, Stride1D.Dense> output)
.NET would create a struct as:
Again, the additional padding at Offset 4 does not match the alignment rules of Cuda - it is expecting to find the Int128 at Offset 8.
Attempt 1
This PR modifies how ILGPU provides the kernel parameters to Cuda. Instead of a single buffer that needs to be manually aligned, we now use an array of pointers, to the kernel parameters.~~Cuda will handle the alignment/padding for us. ILGPU continues to use the single buffer as the placeholder/container of all the kernel parameters.~~
Attempt 2
Changed the PTX Argument Mapper to manually align the kernel parameter fields.Attempt 3 Looks like the issue is specific to
Int128
. It is not treated as a regularly struct. The .NET Runtime considers it an intrinsic type, and aligns it to 16 bytes. Modified ILGPU to pre-registerInt128
, and force a 16 byte alignment.