triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
13.43k stars 1.65k forks source link

Document that tl.reduce assumes associativity/commutativity #4962

Open bertmaher opened 4 weeks ago

bertmaher commented 4 weeks ago

At least, I'm pretty sure that this is true :-). On GPUs tl.reduce generates code that reassociates the operation to reduce in-thread, then in-warp, then in-block, which means you get really unexpected results if you write a non-associative reduction.

I think this is fine behavior but it should be in the docs for tl.reduce.

cc @peterbell10 to check if my understand of tl.reduce is correct

lezcano commented 2 weeks ago

At the moment it is true that we need the operation to be associative and commutative. Note that the fact that we need the operation to be commutative is just coming from a less-than-good implementation (we have swapped the arguments in some operation). In general, we just need the op to be associative, but yeah, it'd be good to document the associativity and fix the commutativity.