maximeraafat / BlenderNeRF

Easy NeRF synthetic dataset creation within Blender
MIT License
811 stars 51 forks source link

Camera Poses are incorrect in Instant-NGP #12

Closed advaiths12 closed 1 year ago

advaiths12 commented 1 year ago

Hello,

Thanks for writing this tool! It is very helpful. I generated a synthetic dataset using COS but when importing it into instant-ngp I find that the camera poses are translated much further from the unit cube. When using the same images and running COLMAP, the poses are correct. Is there any condition on the scene (centered at 0, must be scaled to be within 1x1x1 cube in Blender) that I am missing?

Thanks!

image

advaiths12 commented 1 year ago

I think I figured out the problem. The camera poses must be centered and placed with the "center of attention" within the [-1,1] cube before writing to the json.

maximeraafat commented 1 year ago

Hi @advaiths12, thanks for your comment!

The scene in Blender that you are trying to capture should ideally always be bounded by the aabb volume from Instant NGP. Since this cube is centred around the origin, centring your scene might be necessary in order to align the cameras from Blender with your scene in the NGP framework.

On the other hand, increasing aabb should allow you to capture a larger scene, thus circumventing the need to rescale your scene between two bounds.

Glad you managed to figure this issue out by yourself and feel free to ask more questions if in doubt :)

Quyans commented 1 year ago

I think I figured out the problem. The camera poses must be centered and placed with the "center of attention" within the [-1,1] cube before writing to the json.

hi I met the same problem, do i need to change the code of BlenderNeRF or need to do something in blender?

advaiths12 commented 1 year ago

Hi,

I took some code from the original NeRF repo to center the camera poses within the [-1, 1] cube.

I've attached an example of it in cos_operator.py below:

import os
import shutil
import bpy
from . import helper, blender_nerf_operator
import numpy as np
import mathutils
from mathutils import Matrix, Quaternion, Vector

# global addon script variables
EMPTY_NAME = 'BlenderNeRF Sphere'
CAMERA_NAME = 'BlenderNeRF Camera'

#taken from the original NeRF repo
def closest_point_2_lines(oa, da, ob, db): # returns point closest to both rays of form o+t*d, and a weight factor that goes to 0 if the lines are parallel
    da = da / np.linalg.norm(da)
    db = db / np.linalg.norm(db)
    c = np.cross(da, db)
    denom = np.linalg.norm(c)**2
    t = ob - oa
    ta = np.linalg.det([t, db, c]) / (denom + 1e-10)
    tb = np.linalg.det([t, da, c]) / (denom + 1e-10)
    if ta > 0:
        ta = 0
    if tb > 0:
        tb = 0
    return (oa+ta*da+ob+tb*db) * 0.5, denom

# camera on sphere operator class
class CameraOnSphere(blender_nerf_operator.BlenderNeRF_Operator):
    '''Camera on Sphere Operator'''
    bl_idname = 'object.camera_on_sphere'
    bl_label = 'Camera on Sphere COS'

    def execute(self, context):
        scene = context.scene
        camera = scene.camera
        # check if camera is selected : next errors depend on an existing camera
        if camera == None:
            self.report({'ERROR'}, 'Be sure to have a selected camera!')
            return {'FINISHED'}

        # if there is an error, print first error message
        error_messages = self.asserts(scene, method='COS')
        if len(error_messages) > 0:
           self.report({'ERROR'}, error_messages[0])
           return {'FINISHED'}

        output_data = self.get_camera_intrinsics(scene, camera)

        # clean directory name (unsupported characters replaced) and output path
        output_dir = bpy.path.clean_name(scene.cos_dataset_name)
        output_path = os.path.join(scene.save_path, output_dir)
        os.makedirs(output_path, exist_ok=True)

        if scene.logs: self.save_log_file(scene, output_path, method='COS')

        # initial property might have changed since set_init_props update
        scene.init_output_path = scene.render.filepath

        # other intial properties
        scene.init_sphere_exists = scene.show_sphere
        scene.init_camera_exists = scene.show_camera
        scene.init_frame_end = scene.frame_end
        scene.init_active_camera = camera

        if scene.test_data:
            # testing transforms
            output_data['frames'] = self.get_camera_extrinsics(scene, camera, mode='TEST', method='COS')

            print("computing center of attention...")
            totw = 0.0
            totp = np.array([0.0, 0.0, 0.0])
            for f in output_data["frames"]:
                mf = f["transform_matrix"][0:3,:]
                for g in output_data["frames"]:
                    mg = g["transform_matrix"][0:3,:]
                    p, w = closest_point_2_lines(mf[:,3], mf[:,2], mg[:,3], mg[:,2])
                    if w > 0.00001:
                        totp += p*w
                        totw += w
            if totw > 0.0:
                totp /= totw
            print(totp) # the cameras are looking at totp
            nframes = 0
            for f in output_data['frames']:
                f["transform_matrix"][0:3,3] -= totp
                nframes += 1
            avglen = 0.
            for f in output_data['frames']:
                avglen += np.linalg.norm(f["transform_matrix"][0:3,3])
            avglen /= nframes
            print("avg camera distance from origin", avglen)
            for f in output_data['frames']:
                f["transform_matrix"][0:3,3] *= 4.0 / avglen # scale to "nerf sized"            

            for f in output_data['frames']:
                f["transform_matrix"] = f["transform_matrix"].tolist()

            self.save_json(output_path, 'transforms_test.json', output_data)

        if scene.train_data:
            if not scene.show_camera: scene.show_camera = True

            # train camera on sphere
            sphere_camera = scene.objects[CAMERA_NAME]
            sphere_output_data = self.get_camera_intrinsics(scene, sphere_camera)
            scene.camera = sphere_camera

            # training transforms
            sphere_output_data['frames'] = self.get_camera_extrinsics(scene, sphere_camera, mode='TRAIN', method='COS')

            print("computing center of attention...")
            totw = 0.0
            totp = np.array([0.0, 0.0, 0.0])
            for f in sphere_output_data['frames']:
                mf = f["transform_matrix"][0:3,:]
                for g in sphere_output_data['frames']:
                    mg = g["transform_matrix"][0:3,:]
                    p, w = closest_point_2_lines(mf[:,3], mf[:,2], mg[:,3], mg[:,2])
                    if w > 0.00001:
                        totp += p*w
                        totw += w
            if totw > 0.0:
                totp /= totw
            print(totp) # the cameras are looking at totp
            for f in sphere_output_data['frames']:
                f["transform_matrix"][0:3,3] -= totp

            avglen = 0.
            nframes = 0
            for f in sphere_output_data['frames']:
                avglen += np.linalg.norm(f["transform_matrix"][0:3,3])
                nframes += 1
            avglen /= nframes
            print("avg camera distance from origin", avglen)
            for f in sphere_output_data['frames']:
                f["transform_matrix"][0:3,3] *= 4.0 / avglen # scale to "nerf sized"  

            for f in sphere_output_data['frames']:
                f["transform_matrix"] = f["transform_matrix"].tolist()
            self.save_json(output_path, 'transforms_train.json', sphere_output_data)

            # rendering
            if scene.render_frames:
                output_train = os.path.join(output_path, 'train')
                os.makedirs(output_train, exist_ok=True)
                scene.rendering = (False, False, True)
                scene.frame_end = scene.frame_start + scene.nb_frames - 1 # update end frame
                scene.render.filepath = os.path.join(output_train, '') # training frames path
                bpy.ops.render.render('INVOKE_DEFAULT', animation=True, write_still=True) # render scene

            scene.frame_set(scene.frame_start)

        # if frames are rendered, the below code is executed by the handler function
        if not any(scene.rendering):
            # reset camera settings
            if not scene.init_camera_exists: helper.delete_camera(scene, CAMERA_NAME)
            if not scene.init_sphere_exists:
                objects = bpy.data.objects
                objects.remove(objects[EMPTY_NAME], do_unlink=True)
                scene.show_sphere = False
                scene.sphere_exists = False

            scene.camera = scene.init_active_camera

            # compress dataset and remove folder (only keep zip)
            shutil.make_archive(output_path, 'zip', output_path) # output filename = output_path
            shutil.rmtree(output_path)

        return {'FINISHED'}
Quyans commented 1 year ago

已收到邮件

maximeraafat commented 1 year ago

Hi @Quyans,

As far as I know, all you need is to have your camera path and scene of interest within the AABB bounding volume from Instant NGP. For that, you can either increase AABB such that your whole scene fits within thad bound, or you can center your whole scene (technically just the exported camera locations) and hopefully get most of what you need within that bound.

You can either do this via code by somehow normalising the camera location (see the code snippet provided by @advaiths12), or directly manually in Blender by approximately centering your scene and camera around the origin.