call for new feature similar to function __clz() in cuda

taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.

https://taichi-lang.org

Apache License 2.0

25.55k stars 2.29k forks source link

call for new feature similar to function __clz() in cuda #8212

Open Yihao-Shi opened 1 year ago

Yihao-Shi commented 1 year ago

In the process of building linear bvh in gpu, a good idea is that dividing the objectsd by the highest differing bit in their Morton codes, corresponds to classifying them on either side of an axis-aligned plane in 3D. Thus, cuda provides intrinsic function clz() to count the number of leading zero bits in a 32-bit integer. However, Taichi is not aviliable for this feature (but it is important in building linear bvh). I am calling for adding the features which is similar to clz().

lin-hitonami commented 1 year ago

We welcome contribution for this feature. We are happy to offer help if anyone is interested. This PR which adds popcnt to Taichi may be helpful on how to add intrinsics to Taichi.

JettChenT commented 1 year ago

Hi! I'm willing to work on this issue.

lin-hitonami commented 1 year ago

Hi! I'm willing to work on this issue.

Thank you! Please let us know if you need any assistance.

JettChenT commented 1 year ago

Thanks! To be clear, I am writing a manual implementation for __clz() with python, similar to the mentioned PR, eg..

for i in range(32):
    if 2**i > n:
        return 32 - i

, right?

lin-hitonami commented 1 year ago

Not exactly. We should add an intrinsic to the IR and use the built-in intrinsics in the LLVM and SPIRV based backends. We can use llvm::Intrinsic::ctlz in LLVM and use 32 - findMSB in SPIRV backend.

JettChenT commented 1 year ago

noted, thanks!