Open DoeringChristian opened 1 year ago
There are too few details: what is the setup, what does your optimization loop look like? Which integrator are you using?
In general, backpropagation through a rendering algorithm involves an entirely separate simulation that is more costly than the original primal simulation (it needs to compute derivatives of various quantities and write them to memory in addition to sampling light paths).
Thanks for the quick response. I'm trying to write an implementation of this paper. So I implemented an integrator that is only differentiable over the parameters from the first bounce: $L1 + f1 * L2$ Where L1 and f1 are differentiable but L2 is the radiance of further bounces and is not differentiable. My scene has 1956 meshes which are all emitters. I use the principled bsdf but only keep emission, roughness, specular and base_color parameters to be optimized. Each objects base_color is a 10x10 texture and the other values are simple scalar/color values. I have implemented a texture space integrator as described in the paper. Camera rays are generated with gradients disabled since I have to generate multiple batches of rays and concatenate them. The loss function at the moment is a simple mse loss over all sample points. I have tried it with the mitsuba render function which Integrates twice as you mentioned. I also tried it when bypassing the render function, which should be possible in this case since discontinuities don't depend on the parameters if the bsdf function is smooth. The result however was the same. The optimization loop is similar to that of the example.
params = mi.traverse(scene)
params.keep(r".*?((\.bsdf\.base_color\.value)|(\.bsdf\.specular)|(\.bsdf\.roughness\.value)|(\.emitter\.radiance\.value))")
opt = mi.ad.Adam(lr=0.1, params=params)
for it in range(n):
for key, _ in params:
opt[key] = dr.clamp(opt[key], 0., 1.)
for key, _ in params:
params[key] = opt[key]
img, projected = render(scene, sensor, integrator, params, it)
ref = mi.Color3f(refimg.eval(mi.Point3f(projected.x, projected.y))[0:3]
loss = lossfn(img, ref)
dr.backward(loss)
print(loss)
opt.step() # this step takes very long.
print("This gets not printed for long time.")
I can backpropagate and even read out the gradient values of the parameters before the optimization step. When setting the log_level of drjit to Debug I get a long output but I don't quite understand what drjit is doing. I should in theory just update the values. Thanks for your Help.
You need to keep in mind that Dr.Jit will postpone all kernel compilation & evaluation until it can't do so, e.g. when the user calls dr.eval()
. In your case, the dr.eval()
in Optimizer.step()
likely performs the compilation and evaluation of the backward rendering kernel as well, which can be expensive. My point is that it is likely that this line of code doesn't only compute the optimizer's update rule, but much more than that.
To verify this, you can explicitly evaluate the result of the backward rendering routine before calling opt.step()
, e.g. by calling dr.eval(params)
. By doing so, the dr.eval()
in Optimizer.step()
will now only perform the update rule which should be pretty cheap.
I think I understand now. But strangely the compilation still seems to happen at the optimizer step. I tried the following:
dr.eval(params)
dr.eval(opt)
for k, p in params:
dr.eval(p)
for k, p in params:
dr.eval(opt[k])
for k, p in params:
dr.schedule(p)
dr.eval()
for k, p in params:
dr.schedule(opt[k])
dr.eval()
Did you try the following?
for k, v in opt.items():
dr.schedule(v)
dr.eval()
It still hangs in the optimization step.
Actually it is important that you evaluate those values as well as their gradient values. Could you try the following:
for k, v in opt.items():
dr.schedule(v, dr.grad(v))
dr.eval()
Unfortunately it is still happening when evaluating the gradients too. I have also tested this:
values = {}
for k, v in opt.items():
print(f"Evaluating {k}...")
g = dr.grad(v)
values[k] = dr.detach(v) - 0.01 * g
dr.schedule(values[k])
dr.eval()
opt.step()
In that case the compilation seems to happen at the first dr.eval()
so maybe in your example the variables just ran out of scope before being evaluated.
I also tried out implementing an optimizer and calling dr.eval()
for every parameter which seems to have fixed the issue.
for k, v in opt.items():
g = dr.grad(v)
values = dr.detach(v) - 0.01 * g
dr.schedule(values)
dr.eval()
opt[k] = value
dr.enable_grad(opt[k])
And in the adam optimizer by moving this line into the for loop. Do you know why this could be? Is it maybe more efficient to compile multiple smaller kernels?
Does this behavior still holds when using the mi.render()
function? I would like to try this on my end if possible.
Yes interestingly it is the same with and without mi.render()
. I also tested weather the optimization is the same for both methods and it seems that way.
I tested it using the basic cornell box optimization example but with the red and green wall.
Optimizing using original mitsuba Adam optimizer:
mi.webm
Optimizing using modified Adam optimizer:
own.webm
In this case of course the performance difference is negligible.
On my side most of the time is spent in mi.render
and dr.backward
using the following script:
import drjit as dr
import mitsuba as mi
mi.set_variant('llvm_ad_rgb')
scene = mi.load_dict(mi.cornell_box())
ref = mi.render(scene)
params = mi.traverse(scene)
params.keep(r".*?((\.bsdf\.base_color\.value)|(\.bsdf\.specular)|(\.bsdf\.roughness\.value)|(\.emitter\.radiance\.value))")
opt = mi.ad.Adam(lr=0.1, params=params)
for it in range(4):
print(f'iteration {it} ----')
for key, _ in params:
opt[key] = dr.clamp(opt[key], 0., 1.)
for key, _ in params:
params[key] = opt[key]
print(f' render ...')
img = mi.render(scene, params, spp=1024)
print(f' loss ...')
loss = dr.sum(dr.abs(img - ref))
print(f' backward ...')
dr.backward(loss)
print(f' step ...')
opt.step()
print(" done.")
Could you check that you can reproduce this "normal" behavior on your side as well?
It seems that 1024 samples per pixels are a bit much for my pc(32 GB ram) and the program terminates. I tried it with spp=512 but yes most time is spent in the backward step. I think the issue only occurs if there are many parameters that get optimized.
Hi, I have a larger scene with 5868 parameters I want to optimize. I can render it and calculate the gradient of each parameter using
dr.backward
without any problem. When the optimizer callsdr.eval
instep
however drjit takes a long time to complete a single step. Even when using SGD with no momentum or implementing it myself it is an issue. Shouldn't the optimization step be the least computationally expensive if gradients have already been evaluated or did I get something wrong?