How to reduce rendering time using volpath for affordable differentiable rendering?

osylum commented 1 year ago

Summary

I need to reduce rendering time using volpath and envmap such as to render 7200 images at least at resolution 128x128 over 50-150 epochs. Right now, it takes me on average close to 7s per image sample, which is too high for affordable training.

System configuration

OS: Windows-10 CPU: Intel64 Family 6 Model 165 Stepping 5, GenuineIntel GPU: NVIDIA RTX A4000 Python: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] NVidia driver: 517.40 CUDA: 10.0.130 LLVM: 15.-1.-1

Dr.Jit: 0.4.0 Mitsuba: 3.2.0 Is custom build? False Compiled with: MSVC 19.34.31937.0 Variants: scalar_rgb scalar_spectral cuda_ad_rgb llvm_ad_rgb

Description

I am trying to reproduce results from Che2020 to learn subsurface scattering parameters. In total, I have 7200 images in my dataset for training and as much for test. Rendering using volpath and envmap (scene description below), llvm variant, for a resolution of 128x128, sample_count of 4096 and other parameters shown in the scene description below, I get to an average of close to 7s per image sample in the train set, which means around 14h for the whole training dataset, and therefore 25 days for 50 epochs, which I can't afford. Ideally, I would like to render at 512x512, but 128x128 should be ok for training.

I precreate one scene per meshmodel and light rotation and during generation of the dataset update material parameters rendering with mi.render(scene, params)

Number of vertices vary in range [8887, 50000] in the train set. test set has one model with 158708 vertices.

Here are some timings:

default parameters: meshmodel='buddha', sigmaT = 276., albedo = 0.95, g = 0.8, nsamples=4096, rr_depth=5, sampler=ldsampler, resolution=128, variant=llvm reference rendering time: 5.93s

Decreasing sample_count decreases the rendering time to some extend but quality becomes too bad and the decrease in rendering time is probably still not significant enough.. nsamples=[256,1024] -> [0.76s, 1.86s]
Changing rr_depth in [1,5,20] -> [5.91s, 5.93s, 7.14s]
changing sampler [independent, ldsampler] -> [5.5s, 5.93s]
using cuda variant for single rendering increases rendering time: 17.35s
using cuda variant on a batch of images with precreated scene (updating material params) increases in the same way. Most time probably used in sending the data to device; not sure if I can keep data on cuda device and update material params there to gain time.

Is it possible to render this scene much faster? If so, how? If not, how to perform such training in affordable time?

Thank you

Steps to reproduce

Below is the scene I used (model and material parameters can vary):

def create_scene(meshmodel='buddha', sigmaT=276., albedo=0.95, g=0.8, nsamples=4096, render_resolution=[128,128]):

meshmodel_dir = '../ITNSceneFiles/models'
meshmodel_path = os.path.join(meshmodel_dir, meshmodel + ".obj")

# main object
phase = {
    'type': 'hg',
    'g': g
}
mesh_medium = {
    'type': 'homogeneous',
    'sigma_t': sigmaT,
    'albedo': albedo,
    'sample_emitters': True
    'phase': phase
}
mesh_to_world = mi.ScalarTransform4f.translate([0., 0.5, -1.9]).rotate([0.,1.,0.], 180.) # note: reverse from xml
mesh = {
    'type': 'obj',
    'filename': meshmodel_path,
    'bsdf': {'type': 'null'},
    'interior': mesh_medium,
    'face_normals': True, # buddha has wrong face normals
    'flip_normals': False,
    'to_world': mesh_to_world
}

# floor
if False:
    rectangle_reflectance_texture = {
        "type": "bitmap",
        "filename": "../ITNSceneFiles/textures/concrete_darker.png",
        'to_uv': mi.ScalarTransform4f.scale([20., 20., 0.])
    }
    rectangle_bsdf = {
        'type': 'diffuse',
        'reflectance': rectangle_reflectance_texture
    }
    rectangle_to_world = mi.ScalarTransform4f.scale([20.,20.,20.]).rotate([1.,0.,0.], -90.) # note: reverse from xml
    rectangle = {
        'type': 'rectangle',
        'to_world': rectangle_to_world,
        'bsdf': rectangle_bsdf
    }

# light
# sunsky emitter does not exist in mitsuba3
if True:
    envmaps_dir = "../ITNSceneFiles/envmaps"
    #envmap_filepath =os.path.join(envmaps_dir, 'campus.exr')
    #envmap_filepath =os.path.join(envmaps_dir, 'dreifaltigkeitsberg_512.exr')
    envmap_filepath =os.path.join(envmaps_dir, 'schoenbrunn-front_hd.hdr')
    emitter = {
        'type': 'envmap',
        'filename': envmap_filepath,
        #'to_world': envmap_to_world,
        'scale': 0.5
    }
else:
    #emitter_to_world =
    trafo = mi.Transform4f.rotate(axis=[1., 0., 0.], angle=-30.)
    position = trafo @ mi.Point3f(0., 24., -24.)
    pos = [position[0][0], position[1][0], position[2][0]]
    emitter = {
        'type': 'point',
        #'position': [position[0][1], position[0][1], position[0][2]],
        'position': pos,
        'intensity': 300.
    }

camera_to_world = mi.ScalarTransform4f.rotate([1., 0., 0.], -30.).look_at(
    origin=[0, 24, -24],
    target=[0, 23, -23],
    up=[0, 1, 1]
) # note: reverse from xml
#origin, target, up = inverse_look_at(camera_to_world)
#print(f'origin: {origin}')
#print(f'target: {target}')
#print(f'up: {up}')
sensor = {
    "type": "perspective",
    #"fov_axis": "x",
    "fov": 2.5,
    "film": {
        "type": "hdrfilm",
        "width": render_resolution[0],
        "height": render_resolution[1],
        "pixel_format": "luminance", # "rgb", "rgba"
        "rfilter": {"type": "gaussian"}, #, "stddev": 0.25},
        # "rfilter": {"type": "lanczos"},
        "banner": False
    },
    'near_clip': 1.0,
    'far_clip': 100.,
    "to_world": camera_to_world,
    'sampler': {
        'type': 'ldsampler', # ldsampler, independent, stratified, multijitter, orthogonal
        'sample_count': nsamples # seems to read value slightly above nsamples and then upsample to next power of 2
    }
}

# integrator
integrator = {
    'type': 'volpath',
    'max_depth': -1,
    'rr_depth': 5  
}

# scene
scene = mi.load_dict(
    {
        "type": "scene",
        "emitter": emitter,
        "object": mesh,
        #"floor": rectangle,
        "sensor": sensor,
        "integrator": integrator
    }
)

return scene

dvicini commented 1 year ago

Hi, For differentiable volume rendering, you will want to use the prbvolpath integrator (src/python/python/ad/integrators/prbvolpath.py) instead of the regular volpath. The prbvolpath implements the path replay backpropagation algorithm described here: https://rgl.epfl.ch/publications/Vicini2021PathReplay

This should be quite a bit more efficient than the simple volpath, especially as the number of bounces is large.

osylum commented 1 year ago

Hi,

Yes, indeed, for the training I should use that one. Unfortunately it is "only" around 3 times faster. With my current implementation/settings, I would barely be able to train with 7000 samples in a trainset at 128 resolution, which is quite limiting. To get an idea, what is the limit I can expect for differentiable volumetric rendering time? Am I reaching the limit, or there are other means to speed up the calculation (or training)? e.g. NeRF, is using ray casting from purely emissive voxels to learn creating novel views - Should I also write a custom integrator to learn BSSDF parameters in a more efficient way?

Thank you

njroussel commented 1 year ago

I don't think there is much else you can do to speedup the computation with your current setup. Maybe, 4k spp is more than necessary? Running a coarse-to-fine scheme should help too. Noisy gradients still have a lot of information, look at how few samples per pixel we use in this tutorial

I'll close this issue as there doesn't seem to be an immediate issue with Mitsuba itself.

mitsuba-renderer / mitsuba3