Optimizing an envmap starts diverging after a while

dorverbin commented 1 year ago

Summary

When optimizing an envmap, the objective starts consistently increasing after a while.

System configuration

System information:

OS: CentOS Linux release 7.9.2009 (Core) CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz GPU: Tesla V100-SXM2-16GB Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] NVidia driver: 515.65.01 CUDA: 9.0.176 LLVM: 0.0.0

Dr.Jit: 0.2.2 Mitsuba: 3.0.2 Is custom build? False Compiled with: GNU 10.2.1 Variants: scalar_rgb scalar_spectral cuda_ad_rgb llvm_ad_rgb

Description

Implementing a Mitsuba 3 version of the Mitsuba 2 envmap optimization tutorial yields very similar results to those from the tutorial for the first 100 iterations (see Video 1 below), but starts diverging soon after that (see Video 2). The objective starts increasing after ~200 iterations. This also happens when geometry is simpler (e.g. a sphere), and when using other optimizers and learning rates, but is improved (yet still happens later in optimization) by using way more samples. Any idea why this is happening, and how to prevent it?

https://user-images.githubusercontent.com/15837806/205184065-670a92a9-dfba-49fb-b4e2-81e745886bdc.mp4

Video 1: a comparison of my reimplementation in Mitsuba 3 (left) with the video from the Mitsuba 2 tutorial (right)

https://user-images.githubusercontent.com/15837806/205184087-334374fe-5396-43c8-8e7f-f72d54a9d67a.mp4

Video 2: the results of optimizing for 700 iterations

Steps to reproduce

Mitsuba 3 reimplementation:

Step 1: download and unzip bunny.zip provided in the Mitsuba 2 tutorial:

!wget http://mitsuba-renderer.org/scenes/bunny.zip
!unzip bunny.zip

Step 2: run the Mitsuba 3 version of the code from the tutorial:

import mitsuba as mi
import drjit as dr
mi.set_variant('cuda_ad_rgb')

scene = mi.load_file('bunny.xml')

# Find differentiable scene parameters
params = mi.traverse(scene)

# Make a backup copy
param_res = params['my_envmap.data'].shape
param_ref = mi.TensorXf(params['my_envmap.data'])

# Discard all parameters except for one we want to differentiate
params.keep(['my_envmap.data'])

# Render a reference image (no derivatives used yet)
image_ref = mi.render(scene, spp=16)

# Change to a uniform white lighting environment
params['my_envmap.data'] = dr.full(mi.TensorXf, 1.0, shape=param_res)

# Construct an Adam optimizer that will adjust the parameters 'params'
opt = mi.ad.Adam(lr=.02)

# Add envmap parameters to optimizer
opt['my_envmap.data'] = params['my_envmap.data']
params.update(opt);

for it in range(700):
    # Perform a differentiable rendering of the scene
    image = mi.render(scene, params, spp=1)

    # Objective: MSE between 'image' and 'image_ref'
    ob_val = dr.mean(dr.sqr(image - image_ref))

    # Back-propagate errors to input parameters
    dr.backward(ob_val)

    # Optimizer: take a gradient step
    opt.step()
    params.update(opt)

    # Compare iterate against ground-truth value
    err_ref = dr.mean(dr.sqr(param_ref - params['my_envmap.data']))
    print('Iteration %03i: error=%g' % (it, err_ref[0]))

Speido commented 1 year ago

Weird, I'm getting compiler failures with cuda_ad_rgb at random iterations. Critical Dr.Jit compiler failure: jit_optix_compile(): optixModuleGetCompilationState() indicates that the compilation did not complete succesfully. The module's compilation state is: 0x2363

Setting a different seed every iteration certainly seems to help convergence but the error creeps up eventually: image = mi.render(scene, params, spp=1, seed=it)

I suspected "unreachable pixels" but enabling mask_updates in the optimizer did little.

DoeringChristian commented 1 year ago

Hi @Speido, An issue concerning this bug has already been opened. If you are able to build mitsuba you could test it with this commit. The 3.0.2 release should also work without this error.

njroussel commented 1 year ago

This issue has stumped me for weeks, everything seemed correct. I had just missed that in your scene you were using the path integrator, which is the issue here.

As you can see in the plugin reference the data parameter can introduce discontinuities: https://mitsuba.readthedocs.io/en/stable/src/generated/plugins_emitters.html#environment-emitter-envmap. This is because it does some fancy sampling, which was not the case in Mitsuba 2. Discontinuities need to be handled correctly, in Mitsuba 3 we usually have a *_reparam variant of AD integrators which are capable of dealing with discontinuities. You can learn a bit more about AD integrators and dicontinuites here and here.

Also you should change the rendering seed at every iteration, by doing something like img = mi.render(scene, params, seed=it, spp=spp).

With theses two changes the image loss doesn't seem to diverge. Please keep this thread updated if this still doesn't fix this issue on your end.

mitsuba-renderer / mitsuba3