mitsuba-renderer / drjit

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering
BSD 3-Clause "New" or "Revised" License
563 stars 40 forks source link

Loop title prevents kernel caching #213

Closed merlinND closed 7 months ago

merlinND commented 7 months ago

Hi,

The following code:

dr.set_log_level(dr.LogLevel.Info)

def fun(title):
    i = dr.cuda.UInt32([1, 2, 3])
    l = dr.cuda.Loop(title, (lambda: i))

    while l(i < 5):
        i += 1

    print(i)

fun("title1")
fun("title1")
fun("title1")
fun("title2")
fun("title3")

leads to a new kernel compilation for each new loop title:

jit_eval(): launching 1 kernel.
  -> launching 809d2332b518e3e1 (n=3, in=1, out=1, ops=15, jit=8.691 us):
     cache miss, build: 3.79115 ms, 3.289 KiB.
jit_eval(): done.
[5, 5, 5]
jit_eval(): launching 1 kernel.
  -> launching 809d2332b518e3e1 (n=3, in=1, out=1, ops=15, jit=8.4 us):
jit_eval(): done.
[5, 5, 5]
jit_eval(): launching 1 kernel.
  -> launching 809d2332b518e3e1 (n=3, in=1, out=1, ops=15, jit=6.61 us):
jit_eval(): done.
[5, 5, 5]
jit_eval(): launching 1 kernel.
  -> launching d083e43abc706a13 (n=3, in=1, out=1, ops=15, jit=5.29 us):
     cache miss, build: 3.86396 ms, 3.289 KiB.
jit_eval(): done.
[5, 5, 5]
jit_eval(): launching 1 kernel.
  -> launching ff86fbf889595361 (n=3, in=1, out=1, ops=15, jit=5.97 us):
     cache miss, build: 3.9326 ms, 3.289 KiB.
jit_eval(): done.
[5, 5, 5]

My understanding is that this is due to the loop title appearing all the way down in the JITted code. To avoid this, we could:

  1. Change the emitted code to not include the loop title
  2. Or, at least, change the internal usage of loops to have a constant loop title, for example in dr::binary_search:

https://github.com/mitsuba-renderer/drjit/blob/8f0976008f3662756bb078f713e383a98f944e1d/include/drjit/util.h#L187-L193

What do you think?

njroussel commented 7 months ago

Hi @merlinND

For your first suggestion, I thought that this was the case unless we used the new/future JitFlag.Debug. But it doesn't seem to be the case. @wjakob should I change this? or do you think it's particularly helpful to have the loop and condition titles in the IR without the debug flag? (This obviously will only fix the issue in the nanobind-based branches we're working on.)

I also think that we should implement your second suggestion. In principle, whether the debug flag is used or not, should not have any (or as little as possible) consequences on the kernel launches and caching. If we can limit this in our own code, we should. I'll have a look at your PR.

wjakob commented 7 months ago

In the next version of Dr.Jit, the title is optional -- it is simply "untitled" by default. The issue reported here is a bit strange to me because including this label in the generated IR is in some sense the whole point of why such a label even exists.

wjakob commented 7 months ago

I agree that the specific use case in binary_search is potentially problematic, that function needs a major refactor in any case.

merlinND commented 7 months ago

including this label in the generated IR is in some sense the whole point of why such a label even exists.

That's fair, in that case solution (2) is still relevant I think: we should avoid having varying loop titles that users don't have control over. I just checked again and dr::binary_search is actually the only example I could find, so if you agree with #214 this issue can be closed as well :)