microsoft / hlsl-specs

HLSL Specifications
MIT License
118 stars 30 forks source link

[0006] Ability to form vk::BufferPointer to members of pointees #57

Open Dolkar opened 1 year ago

Dolkar commented 1 year ago

Which proposal does this relate to? 0010-vk-buffer-ref.md

Describe the issue or outstanding question. It's been voiced that buffer pointers need a way to be constructed from within the shading language, for example here: https://github.com/microsoft/hlsl-specs/pull/37#issuecomment-1503890903 The currently proposed solution is to construct them from raw addresses represented as uint64_t, but besides the issue of safety, it wouldn't offer much improvement over vk::RawBufferLoad. In every case where this functionality is required, the new address will just be the address of an existing pointer offset by some amount. The point of the initial vk::BufferPointer proposal is to remove having to calculate these offsets manually and in the same way improve reflection and tooling.

Therefore, I think that instead of constructing pointers out of uint64_t, there should be a mechanism that, given a vk::BufferPointer<T,A>, constructs a new pointer vk::BufferPointer<decltype(T::member),A_member> that will point to the member of T. This could be done in a familiar way by just applying the & operator on the result originating from Get():

struct InnerStruct
{
    float4 someValue;
};

struct Globals
{
    float4 otherValue;
    InnerStruct inner;
};

struct PushConstants
{
    vk::BufferPointer<Globals> globals;
};

[[vk::push_constant]] PushConstants pushConstants;

float4 MainPs(void) : SV_Target0
{
    vk::BufferPointer<InnerStruct> structPtr = &pushConstants.globals.Get().inner;
    float4 test = structPtr.Get().someValue;
    return test;
}

Now, the restriction that vk::BufferPointer may point only to structs becomes somewhat limiting. According to what I can gather from https://github.com/microsoft/hlsl-specs/pull/37#issuecomment-1500516139 , structs are needed to communicate the layout of the data, but then this restriction should not need to apply to pointers to struct members.

So far the above can already be expressed with current functionality. But consider this motivating example of building a linked list, nothing too unusual:

typedef vk::BufferPointer<struct Block> BlockPtr;

struct Block
{
    float4 x;
    BlockPtr next;
};

struct FreeBlocks
{
    uint allocatedSize;
    uint usedSize;
    Block blocks[];
};

struct RootPtrs
{
    BlockPtr ptrs[];
};

struct PushConstants
{
    vk::BufferPointer<RootPtrs> rootPtrs;
    vk::BufferPointer<FreeBlocks> freeBlocks;
};

[[vk::push_constant]] PushConstants pushConstants;

// Acquire a new block from the buffer of free blocks
BlockPtr AcquireBlock()
{
    uint id;
    InterlockedAdd(pushConstants.freeBlocks.Get().usedSize, 1, id);

    if (id < pushConstants.freeBlocks.Get().allocatedSize)
    {
        return &pushConstants.freeBlocks.Get().blocks[id];
    }
    else
    {
        // Constructing null pointers would be handy, too
        return {};
    }
}

void MainCs(uint3 tid : SV_DispatchThreadID)
{
    // Traverse the linked list to get to the tail
    vk::BufferPointer<BlockPtr, 8> tailPtr = &pushConstants.rootPtrs.Get().ptrs[tid.x];
    while (!tailPtr.Get().IsNull())
    {
        tailPtr = &tailPtr.Get().Get().next;
    }

    // Now tailPtr is a pointer that needs to be changed to add a new tail block.
    // Note that tailPtr started as a pointer to a member of RootPtrs, but may have later been
    // reassigned to be a pointer to a member of Block.

    // Acquire a new block and assign some payload value we wish to add
    BlockPtr newBlock = AcquireBlock();
    if (!newBlock.IsNull())
    {
        newBlock.Get().x = GetPayload(tid.x);
        newBlock.Get().next = {};

        // Append the block to the end of the list - this is why we needed the pointer-to-pointer
        tailPtr.Get() = newBlock;
    }
}

The above is an idealized implementation, including both pointers to non-struct struct members and also default-constructed null pointers. Those aren't necessary for this example, but it would be impossible to implement this example with the current state of the vk::BufferPointer proposal, because we have no way to form a pointer to one of the free blocks. You would have to pass an array of pointers to the free blocks as input, which in this case would be wasteful. Note that we could have multiple arrays of the free blocks, justifying the need for pointers over indices into the array.

There is a potential difficulty with propagating alignment to assign to the type of vk::BufferPointer. Given the alignment of the parent struct, the compiler would need to calculate the resulting alignment of its member, which may be smaller. In the example above, tailPtr would, I believe, have the resulting alignment of 8. There could be an argument that, for the sake of convenience, vk::BufferPointer<T,A> should be implicitly convertible to vk::BufferPointer<T,B> where B < A, but this issue is already proposing plenty as-is.

Aliasing will probably be an issue as well. If it is assumed that vk::BufferPointer never alias each other, then creating such pointers to members immediately breaks that assumption. However, I see some overlap here with 0006-reference-types.md, which will likely need to resolve a similar issue.

I realize this proposal complicates the implementation of buffer pointers by a fair amount, so I am not expecting this to make it into the initial version, but I think it could be a worthwhile addition afterwards.

To summarize, I see the following benefits to introducing the & operator to the l-value stemming from vk::BufferPointer<T,A>.Get() that returns a pointer to the pointee's member:

Other proposed additions:

llvm-beanz commented 1 year ago

There is some overlap here with 0006-reference-types.

devshgraphicsprogramming commented 1 year ago

Btw KHR_variable_pointers is core in Vulkan 1.1 and has nearly 100% coverage on Vulkan 1.2 devices

This is because we expect OpenCL to universally run on Vulkan Compute in the near future, and it has C style pointers with CUDA storage classes.

Basically you get real pointer types to any non private variable (SSBO,ubo, shared) but which are limited to be invocation private.

This should be sufficient to implement local references (as references are const pointers it's not possible to send them around between threads or pack them in/out of data storage).

EDIT: Disregard what I wrote, KHR_Variable_pointers are not castable to uint64_t so pretty much useless.

For global references you could use BDA/gregs BufferPointer as long as there's some way to recover the offset from base/root given a spv_variable_pointer.

devshgraphicsprogramming commented 1 year ago

Khr_variable_pointer aside.

It seems like hlsl202x should veer away from having a .Get() and just let us use a -> at some point

This means that you want &head->next->next->next to return a vk::BufferPointer

This would mean that the BufferPointer<T,A>::operator-> would need to return a BufferReference<U> which is has a BufferPointer<U,B>::operator&, where

B = min(1<<findLSB(offsetof(T,member)),A)
devshgraphicsprogramming commented 1 year ago

There is some overlap here with 0006-reference-types.

The overlap is tiny.

Because for a global/device HLSL reference the "source" SSBO would have to be baked into the reference type unless we're using some form of bindless.

A BDA/BufferPointer member access chain would need to "remember" it's source BDA value to form a correct address under an & operation.

P.s.For sanity I'd make references a templated type or annotate their address space somehow. Templating/embedding address annotation in the type would be best though as we would not need to write multiple versions of functions depending on which argument is what address space.

Corollary: While references to nearly all address spaces can be transparent (source/root variable known at compile time) meaning root variable and access chain can be substituted at compile time and a pure language feature, a reference to a BDA sourced variable needs to behave as a pointer all the way through so that we can do "address of" on it later on

llvm-beanz commented 1 year ago

It seems like hlsl202x should veer away from having a .Get() and just let us use a -> at some point

Yes, that was discussed. Once references are available in the language the -> operator will also be added. We're explicitly trying to scope this proposal so that it can be enabled without requiring HLSL 202x.

This means that you want &head->next->next->next to return a vk::BufferPointer

No, this is very much what we don't want, and why this proposal overlaps with the references proposal. The built-in address-of operator should return an address, making it return anything else would be extremely unfortunate and introduce complexities that we likely don't want to support long term.

While the references feature isn't fully specified yet, it will require adding reference types and address space annotations to the language and type system. That will allow us to have address-types which we could then use to support address-of.

This would mean that the BufferPointer<T,A>::operator-> would need to return a BufferReference<U> which is has a BufferPointer<U,B>::operator&, where

B = min(1<<findLSB(offsetof(T,member)),A)

I don't think this is correct. I don't understand why the -> operator wouldn't just return a T & with the appropriate address space type.

There is some overlap here with 0006-reference-types.

The overlap is tiny.

I disagree pretty strongly here. I think this issue is out-of-scope for proposal 10, and entirely depends on reference types being added to the language.

Because for a global/device HLSL reference the "source" SSBO would have to be baked into the reference type unless we're using some form of bindless.

We don't require that for passing resources as parameters or local variables of resources. We require the they be resolvable at compile time, so the same should apply here.

A BDA/BufferPointer member access chain would need to "remember" it's source BDA value to form a correct address under an & operation.

It doesn't need to "remember" it, we just need to be able to resolve it at compile time. This will put requirements on inlining when using references, but that's already expected.

P.s.For sanity I'd make references a templated type or annotate their address space somehow. Templating/embedding address annotation in the type would be best though as we would not need to write multiple versions of functions depending on which argument is what address space.

Address space annotations are a requirement for the references proposal.

Corollary: While references to nearly all address spaces can be transparent (source/root variable known at compile time) meaning root variable and access chain can be substituted at compile time and a pure language feature, a reference to a BDA sourced variable needs to behave as a pointer all the way through so that we can do "address of" on it later on

While I don't disagree that address space could be transparent, I don't think that is the correct design here. My personal preference is to have less implicit and more explicit features in the language. We may have some syntactic sugar for "let this reference be to any address space", but I have a strong preference that default references are thread-local memory and everything else has to be explicit.

llvm-beanz commented 1 year ago

@greg-lunarg & @s-perron,

My gut is that we should call this out of scope for proposal 10, and consider this under proposal 6.

I think it is likely since proposal 6 will be large by itself that we will want to handle this request outside proposal 6, but we can revisit as that proposal moves forward.

Any objections?

Dolkar commented 1 year ago

This means that you want &head->next->next->next to return a vk::BufferPointer

No, this is very much what we don't want, and why this proposal overlaps with the references proposal. The built-in address-of operator should return an address, making it return anything else would be extremely unfortunate and introduce complexities that we likely don't want to support long term.

I don't see any mention of the address-of operator being introduced in proposal 6 nor of any generic pointer type that the operator should return. Is there something I'm missing?

The main outcome of this issue should be to establish a way to construct buffer pointers out of references to global memory. You say the & operator shouldn't be used for this directly, which is fine, as long as there is some equivalent syntax that doesn't involve manually counting offsets and alignments.

Considering how large and long term proposal 6 is, I would like to see this added sooner, but I understand if that's not feasible due to the overlap.

s-perron commented 1 year ago

@llvm-beanz I agree. We are already trying to wind down proposal 10, and this idea adds some extra complications. From the parser's perpective, when get() is called it returns an l-value to the type pointed to. Adding an operator that returns a buffer pointer to a member is not easy because it has lost the context that the memory is the global memory.

There is an alternate implementation that could work with the current proposal 10. The free list could be a linked list of BlockPtrs:

vk::BufferPointer<BlockPtr> freeBlocks;

Then the actual blocks could exist somewhere else in memory. This would require more memory, but you could return blocks to the free list more easily. This is a quick though, so I could be wrong.

EDIT: Reconsidered.

llvm-beanz commented 1 year ago

I don't see any mention of the address-of operator being introduced in proposal 6 nor of any generic pointer type that the operator should return. Is there something I'm missing?

Proposal 6 is not fully specified. It is still being actively worked on. References are effectively non-null pointers, which makes them the first address-specifying data type we're proposing adding to HLSL. Much of the language to support this feature will need to be built off that.

The main outcome of this issue should be to establish a way to construct buffer pointers out of references to global memory. You say the & operator shouldn't be used for this directly, which is fine, as long as there is some equivalent syntax that doesn't involve manually counting offsets and alignments.

I don't think we have a solution to that problem that doesn't require new language features since we currently don't have a way to express an address. Our general goal with HLSL is to not add more hacky one-off ways to do things, but to instead look for long-term solutions. I think the correct long-term solution here is to have a way to express non-null addresses, and references seems like the right way to do that to me.

Considering how large and long term proposal 6 is, I would like to see this added sooner, but I understand if that's not feasible due to the overlap.

Realistically, since I think this is out of scope for proposal 10 you're unlikely to get an alternate solution until after proposal 6 anyways. There is a contributor lined up to implement proposal 10, and my team will work on proposal 6, but I don't think anyone else is going to step up to design and implement an alternate solution to this problem.

devshgraphicsprogramming commented 1 year ago

No, this is very much what we don't want, and why this proposal overlaps with the references proposal. The built-in address-of operator should return an address, making it return anything else would be extremely unfortunate and introduce complexities that we likely don't want to support long term.

While the references feature isn't fully specified yet, it will require adding reference types and address space annotations to the language and type system. That will allow us to have address-types which we could then use to support address-of.

As long as you can go "offset chasing" through an l-value expression like head->next->next->next after the fact and work out the offset by the time you apply an & address-of I don't care much how you do it.

Ok you might not want to return a vk::BufferPointer from a address-of & on a reference, however you want whatever your reference type is to return some pointer type that is trivially cast-able to vk::BufferPointer even if it needs to be a dynamic cast.

On top of that without bumping the Shader Model (or requiring SPV extensions) you cannot expect to be able to know the address of a global buffer reference and use pointers.

Basically for any reference the "root" variable (be it local, shared or device) would need to be baked into the type so you can't mix them if SM doesn't support that.

Furthermore I'm pretty sure that there exists no way to convert an SSBO to a BDA pointer and then do arithmetic on that, so in a HLSL without proposal 10 or its equivalent in future DirectX SM such pointers would need to be typed on the root variable,

Only if you have shared memory aliasing feature/extension enabled and declared a hidden aliased T[] arrays could you have shared memory pointers you could cast between each other.

Local/Private memory pointers would be gnarly because your compiler would need to do some analysis about "is there any dynamic pointer arithmetic & what is likely to be reached with it" and decide to "demote" certain variables to arrays in local memory (which in turn would move them out of registers and harm perf). The whole thing would be such a mess that in my opinion it would require the introduction of a new keyword denoting private variables that can't be pointed to, just so you don't trip on the performance front.

devshgraphicsprogramming commented 1 year ago

I don't think this is correct. I don't understand why the -> operator wouldn't just return a T & with the appropriate address space type.

Because you've lost the alignment information baked into the vk::BufferPointer<T,A>

I think I'm correct as if your original pointer to a structure type T is guaranteed to be aligned to A then a new pointer to a member m type U within T pointed by a pointer with alignment A can only be guaranteed to be aligned to the alignment of its offset within the struct which is offsetof(T,m).

The findMSB is how I find the smallest bit set in the offset number, ergo the largest power of 2 that evenly divides the offset.

I'm obviously skipping the requirement of the pointer being aligned to the alignment of the type U but thats because I assume the offset of the member U m within the struct must be aligned to that.

Now obviously the overall alignment of a pointer to U m w.r.t. the 0 address in the address space needs to be a minimum of the two values above.

devshgraphicsprogramming commented 1 year ago

There is some overlap here with 0006-reference-types.

The overlap is tiny.

I disagree pretty strongly here. I think this issue is out-of-scope for proposal 10, and entirely depends on reference types being added to the language.

Because for a global/device HLSL reference the "source" SSBO would have to be baked into the reference type unless we're using some form of bindless.

We don't require that for passing resources as parameters or local variables of resources. We require the they be resolvable at compile time, so the same should apply here.

A BDA/BufferPointer member access chain would need to "remember" it's source BDA value to form a correct address under an & operation.

It doesn't need to "remember" it, we just need to be able to resolve it at compile time. This will put requirements on inlining when using references, but that's already expected.

P.s.For sanity I'd make references a templated type or annotate their address space somehow. Templating/embedding address annotation in the type would be best though as we would not need to write multiple versions of functions depending on which argument is what address space.

Address space annotations are a requirement for the references proposal.

Corollary: While references to nearly all address spaces can be transparent (source/root variable known at compile time) meaning root variable and access chain can be substituted at compile time and a pure language feature, a reference to a BDA sourced variable needs to behave as a pointer all the way through so that we can do "address of" on it later on

While I don't disagree that address space could be transparent, I don't think that is the correct design here. My personal preference is to have less implicit and more explicit features in the language. We may have some syntactic sugar for "let this reference be to any address space", but I have a strong preference that default references are thread-local memory and everything else has to be explicit.

Ok, you're right, just my two cents.

Right now all functions in SPV remain functions, so certain IHV's IR->ISA compilers/codegen where function calls are a thing can keep them that way without inlining (if they for example decide that due to IPC invocations may have greater chance of reconverging if we don't inline).

If you don't make references/pointers of different address spaces different types, it will be on DXC to create multiple instantiations of functions that take generic address space references, otherwise if you just use a generic type all the way through every load and store your compiler can't deduce the type of under the hood becomes a nasty switch/select chain.

Second thing is that if you don't make generic pointers/references, then I'd appreciate if their syntax was something akin to

template<typename T/*, size_t alignment maybe?*/>
struct global_ref_t;
template<typename T/*, size_t alignment maybe?*/>
struct shared_ref_t;
template<typename T/*, size_t alignment maybe?*/>
struct local_ref_t; // or private_ref_t

rather than introducting new keywords in the style of OpenCL.

This is because it will make it more difficult to share code (structs and functions) between C++ and HLSL.

devshgraphicsprogramming commented 1 year ago

This means that you want &head->next->next->next to return a vk::BufferPointer

No, this is very much what we don't want, and why this proposal overlaps with the references proposal. The built-in address-of operator should return an address, making it return anything else would be extremely unfortunate and introduce complexities that we likely don't want to support long term.

I don't see any mention of the address-of operator being introduced in proposal 6 nor of any generic pointer type that the operator should return. Is there something I'm missing?

The main outcome of this issue should be to establish a way to construct buffer pointers out of references to global memory. You say the & operator shouldn't be used for this directly, which is fine, as long as there is some equivalent syntax that doesn't involve manually counting offsets and alignments.

Considering how large and long term proposal 6 is, I would like to see this added sooner, but I understand if that's not feasible due to the overlap.

A stop gap solution would be to do exactly what I proposed, which is for .Get() to return a new type, lets call it vk::BufferReference but then the . operator (which is not overloadable) would need to behave differently as well and return you more vk::BufferReferences

Either way it seems a really bad idea because it would still need a lot of work and maintenance, only to be wholly replaced by proposal 0006 in the end.

Also please note that even GLSL doesn't support what you're asking for.

llvm-beanz commented 1 year ago

A stop gap solution would be to do exactly what I proposed, which is for .Get() to return a new type, lets call it vk::BufferReference but then the . operator (which is not overloadable) would need to behave differently as well and return you more vk::BufferReferences

Speaking for myself (not the HLSL team as a whole) I'm not a fan of "stop gap" solutions if they introduce more technical debt that we need to clean up in the future. Once users start using features it makes it harder for us to change them.

Specifically relating to the . operator: we are never going to change the behavior of the . operator again.

HLSL has several awful hacks in cases where the . operator does magic. That magic has a huge technical cost in the compiler because it is all special cased. They all need to be removed, and many but not all can be replaced with -> once we have it available.

I've drawn a hard line on this as we've worked on new language features both publicly and privately, and I'll continue drawing that line. Drawing that hard line is why this proposal has a Get() method instead of yet another magic . operator. Expect to see more Get() methods in the future as we will continue to use that pattern to avoid custom . operators.

Either way it seems a really bad idea because it would still need a lot of work and maintenance, only to be wholly replaced by proposal 0006 in the end.

☝️ Agree. I think understanding the use case here is super valuable for us, but we're just not going to have a solution yet.

Right now all functions in SPV remain functions, so certain IHV's IR->ISA compilers/codegen where function calls are a thing can keep them that way without inlining (if they for example decide that due to IPC invocations may have greater chance of reconverging if we don't inline).

One of the things we've done poorly in DXC in the past, but need to do better with is drawing better distinctions about functions that must be inlined and functions that could be left as functions. DXC's DXIL path made an extremely hard assumption that everything needed to be inlined all the time, which is not strictly true and never really was (see: Raytracing).

If you don't make references/pointers of different address spaces different types, it will be on DXC to create multiple instantiations of functions that take generic address space references, otherwise if you just use a generic type all the way through every load and store your compiler can't deduce the type of under the hood becomes a nasty switch/select chain.

I don't think there is a reasonable path where address spaces aren't part of the type, and thankfully clang is already built to do that. I think we're going to need to provide some helper templates to make it easy for users to write functions that take references and have the template instantiation generate the permutations.

Second thing is that if you don't make generic pointers/references, then I'd appreciate if their syntax was something akin to

template<typename T/*, size_t alignment maybe?*/>
struct global_ref_t;
template<typename T/*, size_t alignment maybe?*/>
struct shared_ref_t;
template<typename T/*, size_t alignment maybe?*/>
struct local_ref_t; // or private_ref_t

rather than introducting new keywords in the style of OpenCL.

This is because it will make it more difficult to share code (structs and functions) between C++ and HLSL.

We are definitely going to be more closely aligned with C++ than OpenCL. Aligning with C++ is the general direction of HLSL.

devshgraphicsprogramming commented 1 year ago

This means that you want &head->next->next->next to return a vk::BufferPointer

No, this is very much what we don't want, and why this proposal overlaps with the references proposal. The built-in address-of operator should return an address, making it return anything else would be extremely unfortunate and introduce complexities that we likely don't want to support long term.

I don't see any mention of the address-of operator being introduced in proposal 6 nor of any generic pointer type that the operator should return. Is there something I'm missing?

The main outcome of this issue should be to establish a way to construct buffer pointers out of references to global memory. You say the & operator shouldn't be used for this directly, which is fine, as long as there is some equivalent syntax that doesn't involve manually counting offsets and alignments.

Considering how large and long term proposal 6 is, I would like to see this added sooner, but I understand if that's not feasible due to the overlap.

I will be quite close to being able to implement my own BDA pointer/reference types with inline-SPIRV if/when #59 gets merged.

https://github.com/microsoft/hlsl-specs/pull/59#issuecomment-1693661117

The only things I'm missing would be offsetof and either decltype or

Of course overloadable operator *, operator & and operator-> (imagine the nasty pointer-to-member wrapper I'd make) would make everything nicer syntax-wise but aren't strictly necessary.

greg-lunarg commented 8 months ago

@llvm-beanz As this is in the Backlog, is this issue holding up acceptance of Proposal 10?

llvm-beanz commented 8 months ago

@greg-lunarg you can ignore this one.

I'm moving this to track on proposal 6. Doing this correctly requires the address-of operator which is out of scope for proposal 10.