Kernel runtime/op-count increases gradually during optimization loop

mitsuba-renderer / drjit

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering

BSD 3-Clause "New" or "Revised" License

593 stars 43 forks source link

Kernel runtime/op-count increases gradually during optimization loop #60

Closed daseyb closed 2 years ago

daseyb commented 2 years ago

Hi everyone!

I've run into an issue where the runtime of my kernel goes up steadily as I run my optimization. It also seems to recompile the kernel for every iteration (I get thousands of files in the cache directory after an hour or two of working). When I enable Info output, it also seems like the number of ops creeps up steadily. I don't think I'm doing anything too weird, but I haven't broken down my program to a minimal example that exhibits the issue (behavior seems fairly random, removing/adding a few lines code can shift the point at which the slow down occurs drastically). Is that something you have encountered before?

Thanks!

merlinND commented 2 years ago

Hi @daseyb,

One sneaky way this can happen is when parts of the computation or outputs are left unevaluated and the computation graph accidentally includes part of the previous iteration. We added Sampler::schedule_state() because of this exact issue: the new state of the sampler would not get evaluated, and so each iteration contained the update step from all of the previous ones.

Maybe something similar is happening in your application?

daseyb commented 2 years ago

That sounds very plausible! Is there a way to check what might be left unevaluated? Also, is there some documentation on what dr.schedule does and when I should call that? For context, I'm writing a little 2D differentiable "renderer" with drjit only, so I'm probably missing some of the "structure" that mitsuba's render() provides. Thanks!

daseyb commented 2 years ago

Turns out I was using a PCG32 object across iterations of the loop! Just calling dr.schedule(pcg.state) at the end of each iteration fixed things! Thanks for the quick help :)

wjakob commented 2 years ago

Relevant quote from the Dr.Jit paper, which sounds like this exact case.

wjakob commented 2 years ago

(and using dr.schedule() works just as well.)