taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.54k stars 2.29k forks source link

cache loop invariant global vars pass induces incorrect results #8576

Open erizmr opened 3 months ago

erizmr commented 3 months ago

A small repro on current Taichi master branch (https://github.com/taichi-dev/taichi/commit/37a056384db90d38816e8483c2716aeea847c988):

import taichi as ti

ti.init(print_ir=True, print_ir_dbg_info = False)

x = ti.Vector.field(3, float, shape=5)

@ti.kernel
def repro():
    # ti.loop_config(serialize=True)
    for i in range(5):
        x[i][2] = x[i][2] + 1.0
        for j in range(1):
            x[i][2] = x[i][2] - 5.0
            print("x value ", x[i][2])
            for z in range(1):
                idx = 0
                if z == 0:
                    idx = 2
                x_print = x[i][idx]
                print("x value inside ", x_print)
                print("x value inside direct access", x[i][2])

repro()

it can be observed that x value != x value inside

x value  -4.000000
x value inside  1.000000
x value inside direct access -4.000000
x value  -4.000000
x value inside  1.000000
x value inside direct access -4.000000
x value  -4.000000
x value inside  1.000000
x value inside direct access -4.000000
x value  -4.000000
x value inside  1.000000
x value inside direct access -4.000000
x value  -4.000000
x value inside  1.000000
x value inside direct access -4.000000

The results are correct if set cache_loop_invariant_global_vars=False however it is set to True by default.

After some investigation, I found it seems that the bug is induced by inaccurate checking of whether a pointer is uniquely accessed in an offload task. The code is in function is_offload_unique of cache_loop_invariant_global_vars.cpp. It seems that x[i][2] should not be identified as uniquely accessed.