Open bertmaher opened 4 weeks ago
At the moment it is true that we need the operation to be associative and commutative. Note that the fact that we need the operation to be commutative is just coming from a less-than-good implementation (we have swapped the arguments in some operation). In general, we just need the op to be associative, but yeah, it'd be good to document the associativity and fix the commutativity.
At least, I'm pretty sure that this is true :-). On GPUs
tl.reduce
generates code that reassociates the operation to reduce in-thread, then in-warp, then in-block, which means you get really unexpected results if you write a non-associative reduction.I think this is fine behavior but it should be in the docs for
tl.reduce
.cc @peterbell10 to check if my understand of
tl.reduce
is correct