taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.33k stars 2.27k forks source link

diff-Taichi error in complicated loop situations #7771

Open liamxu1 opened 1 year ago

liamxu1 commented 1 year ago

I am trying to differentiate a complicated calculation with diff-Taichi, which includes nested range-for statements where the inside loop range depends on the outside loop index. Besides, there can be lots of condition statements.

I tried to reproduce the error with simple codes and below is what I got.

import taichi as ti
ti.init(arch=ti.cpu, debug=True)

NUM = 5
x = ti.field(dtype=float)
y = ti.field(dtype=float)
ti.root.dense(ti.i, NUM).place(x, y)
loss = ti.field(dtype=float, shape=())
ti.root.lazy_grad()

@ti.kernel
def initialize():
    for i in x:
        x[i] = i
        y[i] = 0
    loss[None] = 0

@ti.kernel
def compute_loss():
    for i in range(NUM):
        l = 0.0
        for j in range(i):
            for k in range(j):
                l += x[j] * x[k]
        y[i] = l

    for i in range(NUM):
        loss[None] += y[i]

initialize()
with ti.ad.Tape(loss=loss, validation=True):
    compute_loss()

In both cuda and x64 backend, the code fails with information

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.9.16
[Taichi] Starting on arch=x64
Instruction does not dominate all uses!
  %35 = load i32, ptr %9, align 4
  %125 = sub i32 %35, 1
[E 04/10/23 13:52:33.367 29352] [codegen_llvm.cpp:taichi::lang::FunctionCreationGuard::~FunctionCreationGuard@76] Assertion failure: !llvm::verifyFunction(*body, &llvm::errs())

***********************************
* Taichi Compiler Stack Traceback *
***********************************
0x7ffd7f4cc7f9: taichi::Time::get_cycles in taichi_python.cp39-win_amd64.pyd
0x7ffd7fe22969: taichi::Logger::error in taichi_python.cp39-win_amd64.pyd
0x7ffd7f5fd00a: taichi::lang::directx12::make_aot_module in taichi_python.cp39-win_amd64.pyd
0x7ffd7f5a143f: taichi::Time::get_cycles in taichi_python.cp39-win_amd64.pyd
0x7ffd7f5a24a3: taichi::Time::get_cycles in taichi_python.cp39-win_amd64.pyd
0x7ffd7f61a2be: taichi::lang::directx12::make_aot_module in taichi_python.cp39-win_amd64.pyd
0x7ffd7f61a3c5: taichi::lang::directx12::make_aot_module in taichi_python.cp39-win_amd64.pyd
0x7ffd7f5a35d0: taichi::Time::get_cycles in taichi_python.cp39-win_amd64.pyd
0x7ffd7f2a130b: taichi::lang::aot::Module::get_snode_tree in taichi_python.cp39-win_amd64.pyd
0x7ffd7f3a3a49: taichi::lang::Ndarray::write_float in taichi_python.cp39-win_amd64.pyd
0x7ffd7f3a53e1: taichi::lang::Ndarray::write_float in taichi_python.cp39-win_amd64.pyd
0x7ffe1b869363: recalloc in ucrtbase.dll
0x7ffe1c0e26bd: BaseThreadInitThunk in KERNEL32.DLL
0x7ffe1dc2a9f8: RtlUserThreadStart in ntdll.dll

Internal error occurred. Check out this page for possible solutions:
https://docs.taichi-lang.org/docs/install
[E 04/10/23 13:52:33.435 29352] Received signal 22 (SIGABRT)

***********************************
* Taichi Compiler Stack Traceback *
***********************************
0x7ffd7f4cc7f9: ?get_cycles@Time@taichi@@SA_KXZ in taichi_python.cp39-win_amd64.pyd
0x7ffd7fe22969: ?error@Logger@taichi@@QEAAXAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@_N@Z in taichi_python.cp39-win_amd64.pyd
0x7ffd7f4acea1: ?readback_data@Device@lang@taichi@@UEAA?AW4RhiResult@23@PEAUDevicePtr@23@PEAPEAXPEA_KHAEBV?$vector@V?$shared_ptr@VStreamSemaphoreObject@lang@taichi@@@std@@V?$allocator@V?$shared_ptr@VStreamSemaphoreObject@lang@taichi@@@std@@@2@@std@@@Z in taichi_python.cp39-win_amd64.pyd
0x7ffe1b8be5f5: raise in ucrtbase.dll
0x7ffe1b8bf601: abort in ucrtbase.dll
0x7ffe1b8bed39: terminate in ucrtbase.dll
0x7ffe0e511abf: __NLG_Return2 in VCRUNTIME140_1.dll
0x7ffe0e51232b: __NLG_Return2 in VCRUNTIME140_1.dll
0x7ffe0e5140e9: __CxxFrameHandler4 in VCRUNTIME140_1.dll
0x7ffd7ff0de10: ?from_fs_dir@VirtualDir@io@taichi@@SA?AV?$unique_ptr@UVirtualDir@io@taichi@@U?$default_delete@UVirtualDir@io@taichi@@@std@@@std@@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@5@@Z in taichi_python.cp39-win_amd64.pyd
0x7ffe1dc73cff: __chkstk in ntdll.dll
0x7ffe1dbee456: RtlFindCharInUnicodeString in ntdll.dll
0x7ffe1dc24455: RtlRaiseException in ntdll.dll
0x7ffe1b52fe7c: RaiseException in KERNELBASE.dll
0x7ffdf40a6220: _CxxThrowException in VCRUNTIME140.dll
0x7ffd7fe229ab: ?error@Logger@taichi@@QEAAXAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@_N@Z in taichi_python.cp39-win_amd64.pyd
0x7ffd7f5fd00a: ?make_aot_module@directx12@lang@taichi@@YA?AV?$unique_ptr@VModule@aot@lang@taichi@@U?$default_delete@VModule@aot@lang@taichi@@@std@@@std@@Vany@5@W4Arch@3@@Z in taichi_python.cp39-win_amd64.pyd
0x7ffd7f5a143f: ?get_cycles@Time@taichi@@SA_KXZ in taichi_python.cp39-win_amd64.pyd
0x7ffd7f5a24a3: ?get_cycles@Time@taichi@@SA_KXZ in taichi_python.cp39-win_amd64.pyd
0x7ffd7f61a2be: ?make_aot_module@directx12@lang@taichi@@YA?AV?$unique_ptr@VModule@aot@lang@taichi@@U?$default_delete@VModule@aot@lang@taichi@@@std@@@std@@Vany@5@W4Arch@3@@Z in taichi_python.cp39-win_amd64.pyd
0x7ffd7f61a3c5: ?make_aot_module@directx12@lang@taichi@@YA?AV?$unique_ptr@VModule@aot@lang@taichi@@U?$default_delete@VModule@aot@lang@taichi@@@std@@@std@@Vany@5@W4Arch@3@@Z in taichi_python.cp39-win_amd64.pyd
0x7ffd7f5a35d0: ?get_cycles@Time@taichi@@SA_KXZ in taichi_python.cp39-win_amd64.pyd
0x7ffd7f2a130b: ?get_snode_tree@Module@aot@lang@taichi@@QEAAPEAVField@234@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@Z in taichi_python.cp39-win_amd64.pyd
0x7ffd7f3a3a49: ?write_float@Ndarray@lang@taichi@@QEAAXAEBV?$vector@HV?$allocator@H@std@@@std@@N@Z in taichi_python.cp39-win_amd64.pyd
0x7ffd7f3a53e1: ?write_float@Ndarray@lang@taichi@@QEAAXAEBV?$vector@HV?$allocator@H@std@@@std@@N@Z in taichi_python.cp39-win_amd64.pyd
0x7ffe1b869363: _recalloc in ucrtbase.dll
0x7ffe1c0e26bd: BaseThreadInitThunk in KERNEL32.DLL
0x7ffe1dc2a9f8: RtlUserThreadStart in ntdll.dll

Internal error occurred. Check out this page for possible solutions:
https://docs.taichi-lang.org/docs/install

However, if I change the code to

@ti.kernel
def compute_loss():
    for i in range(NUM):
        for j in range(i):
            for k in range(j):
                y[i] += x[j] * x[k]

    for i in range(NUM):
        loss[None] += y[i]

the code just works fine.

I am wondering what is the suggested way to write such loops. Am I violating the rule "Kernel body must only consist of either multiple for-loops or non-for statements" or is it something else? In addition, I noticed that the code may also fail if I use "for ... continue ...", are there any more rules I should know about when dealing with for statements and if statements by Taichi autodiff?

e.g. The below code will fail

import taichi as ti
ti.init(arch=ti.cpu, debug=True)

NUM = 5
x = ti.field(dtype=float)
ti.root.dense(ti.i, NUM).place(x)
loss = ti.field(dtype=float, shape=())
ti.root.lazy_grad()

@ti.kernel
def initialize():
    for i in x:
        x[i] = i
    loss[None] = 0

@ti.kernel
def compute_loss():
    for i in range(NUM):
        if i > NUM / 2:
            continue
        loss[None] += x[i]

initialize()
with ti.ad.Tape(loss=loss, validation=True):
    compute_loss()

Looking forward to your reply. Many thanks.

erizmr commented 1 year ago

Hi @liamxu1 , thanks for reporting this. The main issue you mentioned here is confirmed as a bug. The fix is submitted in PR #7778 .

The for...continue and while statement are not supported currently in Taichi autodiff.

Though the control flows can be handled, the pure nested loops as your modification are more friendly to the autodiff system.

liamxu1 commented 1 year ago

Thanks for the fast reply. I will give it a try.

liamxu1 commented 1 year ago

Excuse me @erizmr, is there a way I can experience the fixed version? I tried to clone the code in your branch and follow https://docs.taichi-lang.org/docs/dev_install to compile the code, but it failed with a DLL loading error.

erizmr commented 1 year ago

Hi @liamxu1 , could you please share more error info e.g., screenshots ?