Closed WeiPhil closed 1 year ago
Hi @WeiPhil
I wasn't aware of these fine details of scatter_reduce
. By digging a bit, here's what I found:
In CUDA:
ReduceOp::Add
: Supported on integer and single-precision floating point typesReduceOp::Mul
: Not supported at allReduceOp::Min
: Only supported on integer typesReduceOp::Max
: Only supported on integer typesReduceOp::And
: Bitwise operation (supports anything)ReduceOp::Or
: Bitwise operation (supports anything)These are limitations of the red
instruction in PTX (source).
In LLVM (assuming LLVM 16):
ReduceOp::Add
: Supported on integer and floating point typesReduceOp::Mul
: Not supported at allReduceOp::Min
: Supported on integer and floating point typesReduceOp::Max
: Supported on integer and floating point typesReduceOp::And
: Bitwise operation (supports anything)ReduceOp::Or
: Bitwise operation (supports anything)These are mostly restricted by the atomicrmw
LLVVM IR instruction (source).
This has got me wondering why ReduceOp.Mul
was added...
I'll keep this issue open until I figure out what we actually want to support. At the very least, what you are seeing now is "expected" behavior. I believe we could fully support this set of operations with integer and floating point types but it would require some more work (basically manually add some synchronization points). This was either never done because we have only needed ReduceOp.Add
or because there is some other limitation I'm currently unaware of.
ReduceOp.Mul
is there because the plan was also to use this enumeration internally for horizontal reductions (exposed as drjit.prod
in the upcoming nanobind rewrite). I agree that it's pretty weird for atomics.
I will explain these limitations in the documentation. Can the issue be closed?
Sounds good, thank you!
Hi, I've come across two potential issues when performing
scatter_reduce
with thedr.ReduceOp.Max
ordr.ReduceOp.Min
operators and the cuda backend, here is a minimal reproducer:With the LLVM backend this prints the expected result
but with the cuda backend I get the following error:
I'm running on windows with an Nvidia RTX A1000 card and my cuda compiler is the following
Cuda compilation tools, release 11.8, V11.8.89, Build cuda_11.8.r11.8/compiler.31833905_0
.It also seems like
dr.ReduceOp.Mul
is not supported on the two backends and fails (on the cuda backend) with :and on the LLVM backend with:
Are those known limitations/issues of the
scatter_reduce
operator?Best, Philippe