Autodiff is broken for opengl and vulkan

Describe the bug Using the opengl or vulkan backends result in nonsensical values even in the most basic examples. CPU and cuda work fine.

To Reproduce

import taichi as ti

ti.init(arch=ti.vulkan)

x = ti.ndarray(ti.float32, shape=(), needs_grad=True)
y = ti.ndarray(ti.float32, shape=(), needs_grad=True)
x[None] = 2.0

@ti.kernel
def forward(x: ti.types.ndarray(ti.float32, 0, needs_grad=True),
            y: ti.types.ndarray(ti.float32, 0, needs_grad=True)):
    y[None] = x[None]

with ti.ad.Tape(loss=y):
    forward(x, y)

print(x.to_numpy())       # expect 2
print(y.to_numpy())       # expect 2
print(x.grad.to_numpy())  # expect 1

Log/Screenshots

$ python my_sample_code.py
[Taichi] version 1.8.0, llvm 15.0.4, commit 52b24f3e, linux, python 3.8.8
[Taichi] Starting on arch=vulkan
4.0
0.0
0.0

Additional comments I'm not sure if this is a known limitation, but if it is, it would be helpful if it was mentioned somewhere in the docs

taichi-dev / taichi

Autodiff is broken for opengl and vulkan #8524