sarafridov / K-Planes

Other
480 stars 46 forks source link

Toy scene of kplanes #24

Closed wangyuyy closed 1 year ago

wangyuyy commented 1 year ago

Could you please share the toyscene dataset on your project page? I tired use kplanes to reconstruct simple toy scene created by kubric, however kplanes is prone to overfitting. I tried to offer more data, set the scene static, simplify the model parameter, but it didn't work. So I wonder whether there is something wrong with the custom dataset or kplanes fails to reconstruct low-texture scenes. The toy scne on the project page.

image

The train(left) and test(right) result of the custom data. imageimage

sarafridov commented 1 year ago

The scene you’re referring to on our project page is the bouncing balls scene from D-NeRF (https://github.com/albertpumarola/D-NeRF). In general if you see issues like this, the things to try are (1) double check your camera poses, (2) use more views for training, and/or (3) increase the weight on the TV regularization.

wangyuyy commented 1 year ago

Thanks for your reply!

  1. About the camera poses, I write a c2w function and verified it with transform_matrix in DNeRF dataset. Specificly, for a input transform_matrix M, first obtain the position of the camera M[:3,3], then input the position M[:3,3] and the look_at point (0,0,0) into c2w, the output is the same with the input transform_matrix.
  2. About the training views, to analyze the reason of the bad reconstruction, I set the scene static and randomly sample 120 (100 for training, 20 for test) camera poses on a upper hemisphere like D-NeRF.
  3. About the TV regularization, I set tv_loss_weight 0.0001 to 0.0005, but it failed again.

So I wonder whether the reason behind these is that the custom data is so simple that kplanes easily tend to ignore the information of colored ball, but tend to learn more about the white plane.

More detail:

def c2w(camera_position, look_at_point):
    camera_position = np.array(camera_position)
    look_at_point = np.array(look_at_point)
    camera_forward = camera_position - look_at_point
    camera_forward = camera_forward / np.linalg.norm(camera_forward)

    world_up = np.array([0, 0, 1])
    camera_right = np.cross(world_up, camera_forward)
    camera_right = camera_right / np.linalg.norm(camera_right)

    camera_up = np.cross(camera_forward, camera_right)

    transform_matrix = np.eye(4)
    transform_matrix[:3, 0] = camera_right
    transform_matrix[:3, 1] = camera_up
    transform_matrix[:3, 2] = camera_forward
    transform_matrix[0, 3] = camera_position[0]
    transform_matrix[1, 3] = camera_position[1]
    transform_matrix[2, 3] = camera_position[2]

    return transform_matrix.tolist()
wangyuyy commented 1 year ago

I did some more experiments on this. To minimize the difference between the DNeRF dataset and the custom dataset. I use the same resolution, the same transform.json file and config file of bouncingball and keep the scene static, (by which the time in the json file doesn't have to be distinguished). The camera intrinsics and the config file (include bbox) are the same with the D-NeRF dataset. To generate data, I extracted camrera poses of the transform matrix and set camera poses with look_at point (0,0,0). However, the test scene was reconstructed well at some views, but suffered from great artifacts at other views.

I can't figure out why there are such great artifacts at some test views, while there is little difference between DNeRF data and Custom data.

Reconstruct results (test view) imageimageimage