taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.5k stars 2.28k forks source link

Reverse NDRange support #4379

Open ayanlv opened 2 years ago

ayanlv commented 2 years ago

There comes one special case which requires a reverse loop, so I used “ti.ndrange(8,0)” instead. However, the reverse loop just didn’t work with zero warnings or errors (I have no idea whether ti.ndrange can be used to loop in reverse order at that time) Perhaps a warning message can be given when n1 > n2 in ti.ndrange((n1,n2))

import taichi as ti
ti.init(arch=ti.gpu, debug=True, default_fp=ti.f64)

x = ti.field(ti.f64, shape=(10))

@ti.kernel
def main():
    x[9] = 1.0
    for i in ti.ndrange((8,0)):
        x[i] = 2.0 * x[i+1]
        print(x[i])    #print nothing

main()
bobcao3 commented 2 years ago

Since it's ndrange() I would assume ndrange((8,0)) is understood as a 2D loop, with one index having 0 size? I'm not sure tho.

bobcao3 commented 2 years ago

Sorry, I missed the tuple.

weiyunfei commented 2 years ago

Hello! I'd like to take a look. I plan to add an if-else logic in _Ndrange.__init__() for reverse order, or do you have a better idea?

strongoier commented 2 years ago

Hello! I'd like to take a look. I plan to add an if-else logic in _Ndrange.__init__() for reverse order, or do you have a better idea?

Hi @weiyunfei. Thanks for showing interest in this issue! It is yours now. Code around _Ndrange is a good place to start with. I suspect there will be more code involved. Feel free to post things here if you have more findings or meet any problems :-) We will be glad to help.

weiyunfei commented 2 years ago

Hi, @strongoier. I really need some help now. After modifying the code of _Ndrange, the following test works well.

import taichi as ti
ti.init(arch=ti.gpu, debug=True, default_fp=ti.f32)

x = ti.field(ti.f32, shape=(10))

def main():
    x[9] = 1.0
    for i in ti.ndrange((8,0)):
        x[i] = 2.0 * x[i[0]+1]
        print(x[i])    #print nothing

main()

However, I found the main() function decorated by @ti.kernel does not work. The error is as follow:

RuntimeError: [kernel_manager.cpp:launch@188] Assertion failure: kernel_attribs_.advisory_total_num_threads >= 0

When I check the AST building process, I have not figured out how it works when translating the for loop into a multi-thread version. I have read the codes for building ast_transformer and some codes in taichi/ir/frontend_ir.cpp.

I need to know which codes should be involved for the multi-thread version since I'm not enough familiar with Taichi.

Appreciate your kindness and I will keep trying to figure it out.

P.S. I find the output of each iter of _Ndrange is a tuple, while in the multi-thread version it outputs an integer.

weiyunfei commented 2 years ago

Should I write the total IR generation process of a reverse version ndrange or just modify the range ir generation?

strongoier commented 2 years ago

Hi @weiyunfei. Sorry for the late reply. I suggest starting simple - what about supporting for i in range(8, 0) in Taichi kernels first?

Here is the entrance: https://github.com/taichi-dev/taichi/blob/300d12a5f73dd4a5fae3927889de734b516ed164/python/taichi/lang/ast/ast_transformer.py#L1016-L1018

You can follow the code path and find out everything you need. In fact, there is already some support for reversed range fors: https://github.com/taichi-dev/taichi/blob/300d12a5f73dd4a5fae3927889de734b516ed164/taichi/ir/statements.h#L770

However it is never exposed to users. It'll be a great step if you can link things together and get reversed range fors working!

weiyunfei commented 2 years ago

@strongoier Thanks for your suggestions. I have supported the reversed range for today, and I'm going to implement the reversed ndrange for at next step.

Besides, I'm confused about why taich needs two levels IR. Is it more flexible for different users?

strongoier commented 2 years ago

Besides, I'm confused about why taich needs two levels IR. Is it more flexible for different users?

There are some historical reasons. In the future, the frontend IR will become a general AST targeting different frontends (Python, TypeScript, ...).