Numerical error with atomic add on shared array elements

taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.

Apache License 2.0

25.52k stars 2.29k forks source link

Minimal reproduction:

import taichi as ti

block_dim = 64
N=256

ti.init(arch=ti.cuda, print_ir=True, print_kernel_llvm_ir=True)

@ti.kernel
def test(out:ti.types.ndarray()):
   ti.loop_config(block_dim=block_dim)
   for i in range(N):
      #gtid = ti.global_thread_idx()
      tid = i % block_dim
      val = i * 1.0
      sharr = ti.simt.block.SharedArray((block_dim,), ti.f32)
      sharr[tid] = val
      ti.simt.block.sync()
      #ti.atomic_add(sharr[0], val)
      sharr[0] += sharr[tid]
      ti.simt.block.sync()
      out[i] = sharr[tid]

arr = ti.ndarray(ti.f32, (N))
test(arr)
print(arr.to_numpy())

This gives unexpected numerical results.

taichi-dev / taichi

Numerical error with atomic add on shared array elements #7510