m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs
http://www.ilgpu.net
Other
1.38k stars 117 forks source link

ArrayView.Cast should support unaligned byte offsets #1018

Closed TriceHelix closed 1 year ago

TriceHelix commented 1 year ago

Dear ILGPU Devs,

The current implementation of ArrayView<T>.Cast<TOther> ignores the possibility of the original ArrayView being offset by a number of bytes which is not a multiple TOther's size. The implemented behaviour makes sense for the ArrayView length/extent, as any trailing space not sufficient for another element should be truncated, however the same cannot be said for the offset.

Consider this example:

private struct Struct_16bytes
{
    public long value1;
    public long value2;
}

// create a new context and accelerator
// <...>

const int BUFFER_SIZE = 128;
using var buffer = accelerator.Allocate1D<byte>(BUFFER_SIZE);

const int BYTE_OFFSET = 8; // can be any non-multiple of struct size
var offsetView = buffer.AsRawArrayView(BYTE_OFFSET, sizeof(Struct_16bytes));
var castView = offsetView.Cast<Struct_16bytes>();

// this prints 8, as expected
Console.WriteLine($"Original offset: {((IContiguousArrayView)offsetView).IndexInBytes}");

// this prints 0, should be 8
Console.WriteLine($"Cast offset: {((IContiguousArrayView)castView).IndexInBytes}");

I believe casting should preserve "odd" offsets because it would be trivial to achieve the same effect using pointer arithmetic in CUDA, for example. Additionally, there are cases where someone may want to split a single allocation into buffers with different data types (like me https://github.com/m4rs-mt/ILGPU/issues/1016). Aligning memory sections is simple enough using AlignTo(...) but the current Cast implementation can undo that aligning in certain situations, more often than not when the output type has a large size.

Thank you for your time!

TriceHelix commented 1 year ago

I believe the same issue occurs when calling AlignTo(...) and specifying a number of bytes which is not a multiple of the data type. Most of the time that behaviour makes sense, however I would love to be able to align large structs with "weird" sizes (e.g. 104 bytes) to a more "even" boundary, such as 8 or 16 bytes.

m4rs-mt commented 1 year ago

Thanks for your feedback and your feature request @TriceHelix. I think this raises further questions on what the actual behavior of the Cast function should be - given the fact that we do not want to break existing software. I was thinking that we could introduce another CastAligned function with custom semantics. What do you think?

TriceHelix commented 1 year ago

That sounds reasonable, however I realized that array views would have to be changed fundamentally to support that kind of functionality. I'm not too familiar with how ILGPU works under the hood, but it looks like all ArrayView structs/interfaces rely on the index and length properties being relative to the data type's size. If the view itself is cast to a different type, I would expect that view to still point to the same location/offset in memory (even if the address is no longer a multiple of the type's size or aligned differently). However, the ArrayView is incapable of "understanding" anything else than an offset of n elements, relative to the underlying buffer's address. And that is honestly a good thing in 99.9% of situations, both for simplicity and optimization.

After that re-evaluation I have determined that my idea is kind of stupid. It's a thing that is certainly possible on a low level but not at all essential in ILGPU. Plus it will, as you say, break a bunch of things. Sorry for the inconvenience!