Use shape index for segmentation mask but got weird result

kuonangzhe commented 9 months ago

Summary

Using AOV plugin, shape index from ray intersect is not the real shape index (0 to total shape num -1).

System configuration

System information:

OS: 13.4.1 CPU: b'Apple M1 Pro' Python: 3.8.13 (default, Mar 28 2022, 06:13:39) [Clang 12.0.0 ] LLVM: -1.-1.-1

Dr.Jit: 0.4.4 Mitsuba: 3.5.0 Is custom build? False Compiled with: AppleClang 14.0.0.14000029 Variants: scalar_rgb scalar_spectral llvm_ad_rgb

Description

I want to use Mitsuba as a core for synthetic data generation, and now trying to get the ground truth labels for segmentation mask. I checked AOV plugin and seems like shape_index is the one I need, while the result looks weird. I tried both common ply object and shapegroup, but the results are all weird.

Steps to reproduce

import numpy as np
import mitsuba as mi
mi.set_variant("scalar_rgb")
from mitsuba import ScalarTransform4f as T

scene_base = {'type': 'scene'}
integrator = {'integrator': {'type': 'aov',
        'aovs': 'dd.y:depth,nn:sh_normal,index:shape_index,albedo:albedo',
        'my_image': {
            'type': 'path',
        }
        }}
light = {'light': {'type': 'constant'}}
teapot = {
    'teapot': {
        'type': 'ply',
        'filename': './teapot.ply',
        'to_world': T.translate([0, 0, -1.5]),
        'bsdf': {
            'type': 'diffuse',
            'reflectance': {'type': 'rgb', 'value': [0.1, 0.2, 0.3]},
        },
    }}

# Declare a named shape group containing two objects
shape_group = {'my_shape_group': {
    'type': 'shapegroup',
    'first_object': {
        'type': 'ply',
        'filename': './teapot.ply',
        'to_world': T.translate([0, 0, -1.5]),
        'bsdf': {
            'type': 'diffuse',
            'reflectance': {'type': 'rgb', 'value': [0.1, 0.2, 0.3]},
        },
    },
},

# Instantiate the shape group without any kind of transformation
'first_instance': {
    'type': 'instance',
    'shapegroup': {
        'type': 'ref',
        'id': 'my_shape_group'
    }
},
}
scene_desc = [scene_base, integrator, light, shape_group]
scene_dict = {}
for desc in scene_desc:
    scene_dict.update(desc)

scene = mi.load_dict(scene_dict)

def load_sensor(r, phi, theta):
    # Apply two rotations to convert from spherical coordinates to world 3D coordinates.
    origin = T.rotate([0, 0, 1], phi).rotate([0, 1, 0], theta) @ mi.ScalarPoint3f([0, 0, r])
    return mi.load_dict({
        'type': 'perspective',
        'fov': 39.3077,
        'to_world': T.look_at(
            origin=origin,
            target=[0, 0, 0],
            up=[0, 0, 1]
        ),
        'sampler': {
            'type': 'independent',
            'sample_count': 16
        },
        'film': {
            'type': 'hdrfilm',
            'width': 256,
            'height': 256,
            'rfilter': {
                'type': 'tent',
            },
            'pixel_format': 'rgb',
        },
    })

sensor_count = 6

radius = 12
phis = [20.0 * i for i in range(sensor_count)]
theta = 60.0

sensors = [load_sensor(radius, phi, theta) for phi in phis]

images = [mi.render(scene, spp=16, sensor=sensor) for sensor in sensors]

# shape index
x = np.asarray(images[0][:, :, 7:8])
print(x[np.where(x > 0)])
mi.util.write_bitmap("my_first_render.png", images[0][:, :, 7:8])

The printed result is

[3.5267453e+08 6.6194835e+08 9.8087200e+08 ... 2.5923027e+08 9.8750008e+07
 1.6017323e+07]

instead of 0 or 1.

njroussel commented 9 months ago

Hi @kuonangzhe

You're using a tent reconstruction and a larger than 1 spp value, this is most likely not what you want. Each pixel could theoretically cover more than 1 shape, hence the "averaging" you're seeing in your current outputs.

Try using a box filter with spp=1, that way you only trace a single ray per pixel and that value is not convoluted with any filter.

kuonangzhe commented 9 months ago

Hi @kuonangzhe

You're using a tent reconstruction and a larger than 1 spp value, this is most likely not what you want. Each pixel could theoretically cover more than 1 shape, hence the "averaging" you're seeing in your current outputs.

Try using a box filter with spp=1, that way you only trace a single ray per pixel and that value is not convoluted with any filter.

@njroussel Thanks a lot for the reply!

I updated the filter type in film to box, and changed the spp to be 1, and finally the returned shape_index are all the same int values. However, their value range is not straightforward to understand:

[4.2949673e+09 4.2949673e+09 4.2949673e+09 ... 4.2949673e+09 4.2949673e+09
 4.2949673e+09]

I also printed the shape id from scene (I also tried shape.ptr but still not the same with shape index), by

sh_ids = [x.id() for x in scene.shapes()]
print(sh_ids)

which are:

['first_instance']

Is this correct? Are there any references for me to check association between shape index and shape id?

njroussel commented 8 months ago

Hi @kuonangzhe

Sorry for the delay. This sound like a bug, I'll have a look now.

njroussel commented 8 months ago

Ok this is limited to scalar_* variants. We try to re-interpret the shape's pointer as a uint32_t.

For JIT variants this would require a bit more work to handle correctly. But in scalar variants, it is quite easy to build a correct indexing w.r.t. the ordering in scene.shapes(). I've pushed a commit which fixes this.

Thank you for the bug report.

kuonangzhe commented 8 months ago

Hi @kuonangzhe

You're using a tent reconstruction and a larger than 1 spp value, this is most likely not what you want. Each pixel could theoretically cover more than 1 shape, hence the "averaging" you're seeing in your current outputs.

Try using a box filter with spp=1, that way you only trace a single ray per pixel and that value is not convoluted with any filter.

Thanks a lot for fixing this!

BTW, since we are working on a data-gen scenario, the data quality actually matters and spp should be mostly 128 or something, then how can we handle this image-to-groundtruth mapping, by generating image twice and one with spp=1 for groud truth and one with spp=128 for high quality image?

Are there any suggestions for an elegant & correct implementation? Thanks a lot!

njroussel commented 8 months ago

I think there i no way around it: you need to render twice. The first time you can use an aov with spp=1 and that doesn't wrap any other integrator (this should work I believe, if it doesn't you can wrap the depth integrator, or create your own dummy integrator that just returns zero). The second time you use your "main" integrator path with a higher spp.

You can keep the box filter for both renderings. If you really wanted to, you could create a second almost-identical sensor which has a different filter and then specific it when rendering mi.render(scene, integrator=path_integrator, sensor=new_sensor).

Ideally, you want to avoid re-loading a scene entirely. In what we've talked about to far, you're pretty lucky. You can pass an integrator and/or a sensor to the mi.render() to achieve your goal.

mitsuba-renderer / mitsuba3