Closed TriceHelix closed 1 year ago
I believe the same issue occurs when calling AlignTo(...) and specifying a number of bytes which is not a multiple of the data type. Most of the time that behaviour makes sense, however I would love to be able to align large structs with "weird" sizes (e.g. 104 bytes) to a more "even" boundary, such as 8 or 16 bytes.
Thanks for your feedback and your feature request @TriceHelix. I think this raises further questions on what the actual behavior of the Cast
function should be - given the fact that we do not want to break existing software. I was thinking that we could introduce another CastAligned
function with custom semantics. What do you think?
That sounds reasonable, however I realized that array views would have to be changed fundamentally to support that kind of functionality. I'm not too familiar with how ILGPU works under the hood, but it looks like all ArrayView structs/interfaces rely on the index and length properties being relative to the data type's size. If the view itself is cast to a different type, I would expect that view to still point to the same location/offset in memory (even if the address is no longer a multiple of the type's size or aligned differently). However, the ArrayView is incapable of "understanding" anything else than an offset of n elements, relative to the underlying buffer's address. And that is honestly a good thing in 99.9% of situations, both for simplicity and optimization.
After that re-evaluation I have determined that my idea is kind of stupid. It's a thing that is certainly possible on a low level but not at all essential in ILGPU. Plus it will, as you say, break a bunch of things. Sorry for the inconvenience!
Dear ILGPU Devs,
The current implementation of
ArrayView<T>.Cast<TOther>
ignores the possibility of the original ArrayView being offset by a number of bytes which is not a multipleTOther
's size. The implemented behaviour makes sense for the ArrayView length/extent, as any trailing space not sufficient for another element should be truncated, however the same cannot be said for the offset.Consider this example:
I believe casting should preserve "odd" offsets because it would be trivial to achieve the same effect using pointer arithmetic in CUDA, for example. Additionally, there are cases where someone may want to split a single allocation into buffers with different data types (like me https://github.com/m4rs-mt/ILGPU/issues/1016). Aligning memory sections is simple enough using AlignTo(...) but the current Cast implementation can undo that aligning in certain situations, more often than not when the output type has a large size.
Thank you for your time!