Closed MoFtZ closed 6 months ago
@MoFtZ thanks for reporting this. A quick investigation revealed that it looks like that 64bit additions with carry are not mapped properly to the data structure.
@m4rs-mt It currently looks like an issue with the kernel launcher marshaling of the Int128 parameter. When performing Int128 operations within the kernel, it appears to work as expected. However, when using the supplied kernel parameter, the issue appears.
@m4rs-mt OK, so I have confirmed that this is definitely to do with the kernel parameter marshaling.
ILGPU is not taking into account any structure padding/alignment.
I expected the new
Int128
data type to just work, even if not necessarily performant. However, I have found an unexpected issue.The following kernel does not work on Cuda, and generates the wrong output:
Expected Output:
Actual Output on Cuda: