taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.05k stars 2.26k forks source link

Autodiff is broken for opengl and vulkan #8524

Open bmanga opened 2 months ago

bmanga commented 2 months ago

Describe the bug Using the opengl or vulkan backends result in nonsensical values even in the most basic examples. CPU and cuda work fine.

To Reproduce

import taichi as ti

ti.init(arch=ti.vulkan)

x = ti.ndarray(ti.float32, shape=(), needs_grad=True)
y = ti.ndarray(ti.float32, shape=(), needs_grad=True)
x[None] = 2.0

@ti.kernel
def forward(x: ti.types.ndarray(ti.float32, 0, needs_grad=True),
            y: ti.types.ndarray(ti.float32, 0, needs_grad=True)):
    y[None] = x[None]

with ti.ad.Tape(loss=y):
    forward(x, y)

print(x.to_numpy())       # expect 2
print(y.to_numpy())       # expect 2
print(x.grad.to_numpy())  # expect 1

Log/Screenshots

$ python my_sample_code.py
[Taichi] version 1.8.0, llvm 15.0.4, commit 52b24f3e, linux, python 3.8.8
[Taichi] Starting on arch=vulkan
4.0
0.0
0.0

Additional comments I'm not sure if this is a known limitation, but if it is, it would be helpful if it was mentioned somewhere in the docs

bobcao3 commented 6 days ago

It's suprising that ad.Tape works at all on ndarrays.... So we are looking into why it worked on LLVM backends first.

ad.Tape is designed for use with ti.fields. For ndarray autodiff the intended usecase is to use it as a custom function and use Torch's graph