Closed shawnl closed 3 years ago
Thanks for bringing up this issue. This is something where you have a good understanding of the problem and myself (and probably others) will have a hard time following along until either we (1) go spend a significant time learning about SIMD and trying to use it or (2) you help us understand this use case more. One way you could do (2) would be to provide some Zig code using the current SIMD features, and demonstrate how it is unable to satisfy the use case. Then provide an altered example with your proposed feature and explain how it solves the problem.
Such an example would be necessary for a test case anyway, and would propel this issue forward.
I opened this largely as a note to myself, partially to not forget things. This is mostly blocked on #903 , and yes I do want to explain some more what is going on.
@shawnl will you consider this counter proposal to your all
, any
thing?
@fold(comptime op: FoldOperator, vector: var) @typeInfo(vector).Vector.child
const FoldOperator = enum {
Add,
Sub,
Mul,
And,
Or,
Xor,
// more?
};
it does, in pseudocode,
var result = operator(vector[0], vector[1]);
result = operator(result, vector[2]);
result = operator(result, vector[3]);
// ...
return result
Actually I don't see why this would even need to be a builtin. This would work just fine in userland.
So the example with an if statement:
if (fold(.And, v1 >= v2)) {
// do your logic
}
I'm sure there's a better word than fold
, maybe reduce
? The functional programming people have great vocabulary words for these things.
(edit) The accepted proposal is for this to be a builtin. Continue reading below for reasoning.
Actually I don't see why this would even need to be a builtin. This would work just fine in userland.
If we use a builtin we can use the llvm intrinsics for horizontal folding, unless it is possible to optimize to them. (note that fast math flags are considered, https://llvm.org/docs/LangRef.html#llvm-experimental-vector-reduce-v2-fadd-intrinsic )
I really like this proposal. And LLVM is using reduce
, which isn't bad and better than fold.
Oh, nice, looks like it should be a builtin then. Whether we choose to use the experimental APIs or not can be an implementation detail. It looks like LLVM API and me came up with the same ideas :-)
Sorry, I misunderstand sth. A question, I care about the thing is the pure block
. Is this be accepted?
Accepted Proposal
https://github.com/ziglang/zig/issues/903#issuecomment-492300314
Branching on SIMD is complicated.[1] If only some of the values match you have to both execute the
if
and theelse
block, and merge the results. I think this should be explicit, which@Vector(_, bool)
(which all vector comparisons return, although I do not fully understand the difference in zig betweenu1
andbool
) having built-in functions that returnbool
:.all()
and.any()
.You might then run some scalar code, and if that code is
pure
then the compiler can still safely do optimizations that result in it being processed possibly twice. So in addition to acomptime
block I think we also need apure
block that would give a compile error if the block is notpure
.[2][1] Here is HSAIL talking about wavefront scheduling http://www.hsafoundation.com/html_spec111/HSA_Library.htm#PRM/Topics/02_ProgModel/wavefronts_lanes_and_wavefront_sizes.htm%3FTocPath%3DHSA%25C2%25A0Programmer's%2520Reference%2520Manual%2520Version%25201.1.1%2520%7CChapter%25202.%2520HSAIL%2520Programming%2520Model%7C2.6%2520Wavefronts%252C%2520Lanes%252C%2520and%2520Wavefront%2520Sizes%7C_____0 [2] I don't want to add too many keywords. Perhaps we could have attribute that can be attached to blocks, functions, variables, and other things that are just
enum
literals? (This is also very similar to C where _ is reserved.)