sunset1995 / DirectVoxGO

Direct voxel grid optimization for fast radiance field reconstruction.
https://sunset1995.github.io/dvgo
Other
1.05k stars 109 forks source link

Test views have low PSNR for Custom Inward Facing Dataset #29

Open colloroneluca opened 2 years ago

colloroneluca commented 2 years ago

Hi, I would like to congratulate for this wonderful work! Additionally I would also point out some discrepancies that I found between Train and Test results. I'm training an inward facing real world scene: the scene is composed by 61 images covering redundantly the depicted object. Generated images from training images camera poses have PSNR ≈ 26 while test images PSNR is ≈ 16.
Generated test images seem to be affected by big occluding clouds and distortions. I'm using a slightly modified version of the ./configs/custom/default_ubd_inward_facing.py configuration file which I paste below. Images are 1616x1080. Colmap's extimated camera poses and 3d point cloud seem to be accurate.

Can you suggest some parameters that I should tune differently in order to get better Test results?

base = '../default.py'

expname = 'scene7' basedir = './data/scene7'

data = dict( datadir='./data/scene7', dataset_type='llff', spherify=True, factor=2, llffhold=0, bd_factor=None, white_bkgd=True, rand_bkgd=True, unbounded_inward=True, load2gpu_on_the_fly=True, )

coarse_train = dict(N_iters=0)

fine_train = dict( N_iters=80000, N_rand=4096, lrate_decay=80, ray_sampler='flatten', weight_nearclip=0.0, weight_distortion=0.01, pg_scale=[2000,4000,6000,8000,10000,12000,14000,16000], tv_before=20000, tv_dense_before=20000, weight_tv_density=1e-6, weight_tv_k0=1e-7, )

alpha_init = 1e-4 stepsize = 0.5

fine_model_and_render = dict( num_voxels=(320*3)2, num_voxels_base=(320*3)2, alpha_init=alpha_init, stepsize=stepsize, fast_color_thres={ 'delete': True, 0 : alpha_initstepsize/10, 1500: min(alpha_init, 1e-4)stepsize/5, 2500: min(alpha_init, 1e-4)stepsize/2, 3500: min(alpha_init, 1e-4)stepsize/1.5, 4500: min(alpha_init, 1e-4)*stepsize, 5500: min(alpha_init, 1e-4), 6500: 1e-4, }, world_bound_scale=1, )

coarse_model_and_render = dict( num_voxels=10240002, # expected number of voxel num_voxels_base=10240002, # to rescale delta distance density_type='DenseGrid', # DenseGrid, TensoRFGrid k0_type='DenseGrid', # DenseGrid, TensoRFGrid density_config=dict(), k0_config=dict(), mpi_depth=128, # the number of planes in Multiplane Image (work when ndc=True) nearest=False, # nearest interpolation pre_act_density=False, # pre-activated trilinear interpolation in_act_density=False, # in-activated trilinear interpolation bbox_thres=1e-3, # threshold to determine known free-space in the fine stage mask_cache_thres=1e-3, # threshold to determine a tighten BBox in the fine stage rgbnet_dim=0, # feature voxel grid dim rgbnet_full_implicit=False, # let the colors MLP ignore feature voxel grid rgbnet_direct=True, # set to False to treat the first 3 dim of feature voxel grid as diffuse rgb rgbnet_depth=6, # depth of the colors MLP (there are rgbnet_depth-1 intermediate features) rgbnet_width=128, # width of the colors MLP alpha_init=1e-6, # set the alpha values everywhere at the begin of training fast_color_thres=1e-7, # threshold of alpha value to skip the fine stage sampled point maskout_near_cam_vox=True, # maskout grid points that between cameras and their near planes world_bound_scale=1, # rescale the BBox enclosing the scene stepsize=0.3, # sampling stepsize in volume rendering )

sunset1995 commented 2 years ago

I have tested configs/custom/default_ubd_inward_facing.py on 300 frames subsampled from my casually captured video. It should work.

Some general question:

  1. I assume you used LLFF's imgs2poses.py to generate the pose.
  2. It seems that llffhold=0. Can you make sure the testing views is aligned with the training views solved by COLMAP?
  3. Some visual results can help me diagnose your scene XD.
colloroneluca commented 2 years ago

Hi @sunset1995, thanks for your reply.

  1. I used DirectVoxGO/tools/imgs2poses.py to generate poses.
  2. I inspected Colmap's solved camera positions and they seem to be accurate. I also tried to query the network with custom-made camera positions (I placed the camera in between adjacent training views), but heavy floaters and distorsions are still present. What exactly do you mean by "aligned with the training views"?
  3. Here you go! I attach a rendered version and the ground truth. (Note that in the train set there are other views from above similar to this test one) train-test

Again, thank you very much.

sunset1995 commented 2 years ago

Is it a forward-facing scene? If it is you should use spherify=False (see this guide for more detail).

If it is not a forward-facining scene, I guess the problem is too few viewing angles. There are some techniques to reconstruct from fewer views but they are not supported by current codebase unfortunately. I suggest capture more than 100+ images covering all aspect of the upper semi-sphere of the object of interest.

colloroneluca commented 2 years ago

Actually it is not a forward facing scene.

If it is not a forward-facining scene, I guess the problem is too few viewing angles.

Yes, I do think that too. Anyway, thanks a lot for the help!