google / nerfactor

Neural Factorization of Shape and Reflectance Under an Unknown Illumination
https://xiuming.info/projects/nerfactor/
Apache License 2.0
440 stars 56 forks source link

Question about incompatible shapes(0,3) and (100,3) at II. Joint Optimization in Training, Validation, and Testing #24

Closed Osavalon closed 1 year ago

Osavalon commented 2 years ago

Hi, thank you for the inspiring work and your open source code!When I run the following script, I get an ValueError report the at step II. Joint Optimization in Training, Validation, and Testing: I. Shape Pre-Training and II. Joint Optimization (training and validation)


'''

scene='hotdog_2163' gpus='2' model='nerfactor' overwrite='True' proj_root='/lyy/nerfactor' repo_dir="$proj_root/nerfactor" viewer_prefix='' # or just use ''

I. Shape Pre-Training

data_root="$proj_root/data/selected/$scene" if [[ "$scene" == scan* ]]; then

DTU scenes

  imh='256'

else imh='512' fi if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then

Real scenes: NeRF & DTU

  near='0.1'; far='2'

else near='2'; far='6' fi if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then

Real scenes: NeRF & DTU

  use_nerf_alpha='True'

else use_nerf_alpha='False' fi surf_root="$proj_root/output/surf/$scene" shape_outdir="$proj_root/output/train/${scene}_shape" REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config='shape.ini' --config_override="data_root=$data_root,imh=$imh,near=$near,far=$far,use_nerf_alpha=$use_nerf_alpha,data_nerf_root=$surf_root,outroot=$shape_outdir,viewer_prefix=$viewer_prefix,overwrite=$overwrite"

II. Joint Optimization (training and validation)

shape_ckpt="$shape_outdir/lr1e-2/checkpoints/ckpt-2" brdf_ckpt="$proj_root/output/train/merl/lr1e-2/checkpoints/ckpt-50" if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then

Real scenes: NeRF & DTU

  xyz_jitter_std=0.001

else xyz_jitter_std=0.01 fi test_envmap_dir="$proj_root/data/envmaps/for-render_h16/test" shape_mode='finetune' outroot="$projroot/output/train/${scene}$model" REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config="$model.ini" --config_override="data_root=$data_root,imh=$imh,near=$near,far=$far,use_nerf_alpha=$use_nerf_alpha,data_nerf_root=$surf_root,shape_model_ckpt=$shape_ckpt,brdf_model_ckpt=$brdf_ckpt,xyz_jitter_std=$xyz_jitter_std,test_envmap_dir=$test_envmap_dir,shape_mode=$shape_mode,outroot=$outroot,viewer_prefix=$viewer_prefix,overwrite=$overwrite"

III. Simultaneous Relighting and View Synthesis (testing)

ckpt="$outroot/lr5e-3/checkpoints/ckpt-10" if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then

Real scenes: NeRF & DTU

  color_correct_albedo='false'

else color_correct_albedo='true' fi REPO_DIR="$repo_dir" "$repo_dir/nerfactor/test_run.sh" "$gpus" --ckpt="$ckpt" --color_correct_albedo="$color_correct_albedo"

'''


[trainvali] For results, see: /lyy/nerfactor/output/train/hotdog_2163_nerfactor/lr5e-3 [datasets/nerf_shape] Number of 'train' views: 100 [datasets/nerf_shape] Number of 'vali' views: 8 [models/base] Trainable layers registered: ['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0'] [models/base] Trainable layers registered: ['net_brdf_mlp_layer0', 'net_brdf_mlp_layer1', 'net_brdf_mlp_layer2', 'net_brdf_mlp_layer3', 'net_brdf_out_layer0'] Traceback (most recent call last): File "/lyy/nerfactor/nerfactor/nerfactor/trainvali.py", line 341, in app.run(main) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/lyy/nerfactor/nerfactor/nerfactor/trainvali.py", line 106, in main model = Model(config, debug=FLAGS.debug) File "/lyy/nerfactor/nerfactor/nerfactor/models/nerfactor.py", line 68, in init ioutil.restore_model(self.brdf_model, brdf_ckpt) File "/lyy/nerfactor/nerfactor/nerfactor/util/io.py", line 48, in restore_model ckpt.restore(ckpt_path).expect_partial() File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 2009, in restore status = self._saver.restore(save_path=save_path) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 1304, in restore checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 209, in restore restore_ops = trackable._restore_from_checkpoint_position(self) # pylint: disable=protected-access File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 907, in _restore_from_checkpoint_position tensor_saveables, python_saveables)) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 289, in restore_saveables validated_saveables).restore(self.save_path_tensor) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/saving/functional_saver.py", line 281, in restore restore_ops.update(saver.restore(file_prefix)) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/saving/functional_saver.py", line 103, in restore restored_tensors, restored_shapes=None) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 647, in restore for v in self._mirrored_variable.values)) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 647, in for v in self._mirrored_variable.values)) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 392, in _assign_on_device return variable.assign(tensor) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 846, in assign self._shape.assert_is_compatible_with(value_tensor.shape) File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 1117, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (0, 3) and (100, 3) are incompatible


The shape checkpoints are generated by step I. Shape Pre-Training and the BRDF checkpoints are downloaded from your page. Does it mean i need to pre-train brdf model by myself?

Very much looking forward to your help!

Haian-Jin commented 2 years ago

I met the same problem. Is there any solution now?

wangmingyang4 commented 2 years ago

hi! I met the same problem. Is there any solution now?

Haian-Jin commented 2 years ago

hi! I met the same problem. Is there any solution now?

I just trained the neural BRDF by myself and it solved the problem.

Haian-Jin commented 2 years ago

Can I add your VX? Thanks! Is it to download the BRDF data set to train the MERL_512 model?

just follow the instructions provided by the author. It is pretty simple.

wangmingyang4 commented 2 years ago

follow the section of Preparation step1 in html:https://github.com/google/nerfactor/tree/main/nerfactor my email:wangmingyang4@myhexin.com Can we communicate privately? thanks!

Haian-Jin commented 2 years ago

follow the section of Preparation step1 in html:https://github.com/google/nerfactor/tree/main/nerfactor my email:wangmingyang4@myhexin.com Can we communicate privately? thanks!

Send an email to me via haian@zju.edu.cn if you meet any problem. But I prefer to discuss this here since it may help others, too.

xiumingzhang commented 2 years ago

@Haian-Jin Thanks for reporting a solution! I wonder if you found the pretrained BRDF MLP problematic? Any info useful for me in debugging this is appreciated.

Haian-Jin commented 2 years ago

@Haian-Jin Thanks for reporting a solution! I wonder if you found the pretrained BRDF MLP problematic? Any info useful for me in debugging this is appreciated.

I don‘t know why it happens. I sent the checkpoint that was trained by myself to @wangmingyang4 , and he said he would still meet the same problem. This is prettry strange.

Woolseyyy commented 2 years ago

Hi, I meet the same problem either

Woolseyyy commented 2 years ago

@Haian-Jin @wangmingyang4 @xiumingzhang I think I find the problem. It is caused by https://github.com/google/nerfactor/blob/main/nerfactor/models/brdf.py#L44. If you donnot have merl dataset locally and donnot modify 'data_root' at merl_512/lr1e-2.ini, the terrible thing happens... One more thing is that I didn't find where to get 'brdf_merl_npz'.... It seems to be different from official merl dataset... What do I miss...

Woolseyyy commented 2 years ago

@Haian-Jin @wangmingyang4 @xiumingzhang I think I find the problem. It is caused by https://github.com/google/nerfactor/blob/main/nerfactor/models/brdf.py#L44. If you donnot have merl dataset locally and donnot modify 'data_root' at merl_512/lr1e-2.ini, the terrible thing happens... One more thing is that I didn't find where to get 'brdf_merl_npz'.... It seems to be different from official merl dataset... What do I miss...

haha this is what I miss: https://github.com/google/nerfactor/tree/main/data_gen#converting-the-merl-binary-brdfs-into-a-tensorflow-dataset

wangmingyang4 commented 2 years ago

I'm sorry to reply you so late. @Woolseyyy @Haian-Jin @xiumingzhang

Thanks everyone!

Here, I will summarize how to use the merl_512 trained model provided by the author.

  1. You can use the trinvali_run.sh to train BRDF model. When the lr1e-2.ini file is generated, the training can be canceled.
  2. You should copy the generated lr1e-2.ini file to the downloaded merl_512 folder and replace the original file.

eg: image image

xiumingzhang commented 1 year ago

@wangmingyang4 Thanks for reporting a potential solution. But hmmm, why does this hack work? I'm trying to understand why this works, and how I can make changes to eliminate the need for such a hack.

Woolseyyy commented 1 year ago

@wangmingyang4 Thanks for reporting a potential solution. But hmmm, why does this hack work? I'm trying to understand why this works, and how I can make changes to eliminate the need for such a hack.

the reason is presented here: https://github.com/google/nerfactor/issues/24#issuecomment-1307525096 I think changing the corresponding code and adding a name list file of merl would help.

wangmingyang4 commented 1 year ago

@xiumingzhang Can I set the imh to the original image size for training?

Osavalon commented 1 year ago

I solved this problem after setting the envmaps path.

Osavalon commented 1 year ago

@xiumingzhang I have met a new problem when I run ##2. Compute geometry buffers for all views by querying the trained NeRF

Views (train):   0%|                                          | 0/97 [00:00<?, ?it/s]2022-11-21 23:17:13.473628: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
/lyy/nerfactor/nerfactor_b/nerfactor/util/geom.py:58: RuntimeWarning: invalid value encountered in true_divide
  arr_norm = (arr - arr.min()) / (arr.max() - arr.min())
I1121 23:22:47.950219 140528241899328 animation.py:1118] Animation.save using <class 'matplotlib.animation.FFMpegWriter'>
I1121 23:22:47.951245 140528241899328 animation.py:326] figure size in inches has been adjusted from 7.104166666666667 x 5.333333333333333 to 7.1 x 5.32
I1121 23:22:47.951492 140528241899328 animation.py:346] MovieWriter._run: running command: ffmpeg -f rawvideo -vcodec rawvideo -s 710x532 -pix_fmt rgba -r 12 -loglevel error -i pipe: -vcodec h264 -pix_fmt yuv420p -y /lyy/nerfactor/nerfactor_b/output/surf/pinecone512rays1024/train_000/lvis.mp4
Views (train):   1%|▎                             | 1/97 [06:24<10:15:51, 384.91s/it]/lyy/nerfactor/nerfactor_b/nerfactor/util/geom.py:58: RuntimeWarning: invalid value encountered in true_divide
  arr_norm = (arr - arr.min()) / (arr.max() - arr.min())
I1121 23:28:57.483244 140528241899328 animation.py:1118] Animation.save using <class 'matplotlib.animation.FFMpegWriter'>
I1121 23:28:57.484086 140528241899328 animation.py:326] figure size in inches has been adjusted from 7.104166666666667 x 5.333333333333333 to 7.1 x 5.32
I1121 23:28:57.484405 140528241899328 animation.py:346] MovieWriter._run: running command: ffmpeg -f rawvideo -vcodec rawvideo -s 710x532 -pix_fmt rgba -r 12 -loglevel error -i pipe: -vcodec h264 -pix_fmt yuv420p -y /lyy/nerfactor/nerfactor_b/output/surf/pinecone512rays1024/train_001/lvis.mp4
Views (train):   2%|▋                              | 2/97 [12:25<9:46:54, 370.68s/it]/lyy/nerfactor/nerfactor_b/nerfactor/util/geom.py:58: RuntimeWarning: invalid value encountered in true_divide
  arr_norm = (arr - arr.min()) / (arr.max() - arr.min())

I’d appreciate some help.My script is:

##1. Train a vanilla NeRF, optionally using multiple GPUs:

    scene='pinecone'
    gpus='3'
    proj_root='/lyy/nerfactor/nerfactor_b'
    repo_dir="$proj_root/nerfactor"
    viewer_prefix=''
    data_root="/lyy/nerfactor/data/nerf_real_360_proc/$scene"
    near='0.1'
    far='2'
    lr='5e-4'
    imh='512'
    n_rays_per_step='1024'
    outroot="$proj_root/output/train/${scene}_nerf${imh}rays${n_rays_per_step}n${near}f${far}"
    REPO_DIR="$proj_root" "$proj_root/nerfactor/trainvali_run.sh" "$gpus" --config='nerf.ini' --config_override="n_rays_per_step=$n_rays_per_step,data_root=$data_root,imh=$imh,near=$near,far=$far,lr=$lr,outroot=$outroot,viewer_prefix=$viewer_prefix"

    # Optionally, render the test trajectory with the trained NeRF, only can use 1 gpu
    gpus='3'
    scene='pinecone'
    imh='512'
    near='0.1'
    far='2'
    proj_root='/lyy/nerfactor/nerfactor_b'
    n_rays_per_step='1024'
   outroot="$proj_root/output/train/${scene}_nerf${imh}rays${n_rays_per_step}n${near}f${far}"
    lr='5e-4'
    ckpt="$outroot/lr$lr/checkpoints/ckpt-2"
    REPO_DIR="$proj_root" "$proj_root/nerfactor/nerf_test_run.sh" "$gpus" --ckpt="$ckpt"

  ## Check the quality of this NeRF geometry by inspecting the visualization HTML for the alpha and normal maps. You might
  ## need to re-run this with anothe r learning rate if the estimated NeRF geometry is too off.

##2. Compute geometry buffers for all views by querying the trained NeRF: (single GPU)

    scene='pinecone'
    gpus='3'
    proj_root='/lyy/nerfactor/nerfactor_b'
    repo_dir="$proj_root/nerfactor"
    viewer_prefix=''
    data_root="/lyy/nerfactor/data/nerf_real_360_proc/$scene"
    imh='512'
    lr='5e-4'
    near='0.1'
    far='2'
    n_rays_per_step='1024'
    trained_nerf="$proj_root/output/train/${scene}_nerf${imh}rays${n_rays_per_step}n${near}f${far}/lr${lr}"
    occu_thres='0.5'
    if [[ "$scene" == pinecone* || "$scene" == scan* ]]; then
        # pinecone and DTU scenes
        scene_bbox='-0.3,0.3,-0.3,0.3,-0.3,0.3'
    elif [[ "$scene" == vasedeck* ]]; then
        scene_bbox='-0.2,0.2,-0.4,0.4,-0.5,0.5'
    else
        # We don't need to bound the synthetic scenes
        scene_bbox=''
    fi
    out_root="$proj_root/output/surf/$scene${imh}rays${n_rays_per_step}"
    ##bump this up until GPU gets OOM for faster computation
    mlp_chunk='375000'
    REPO_DIR="$proj_root" "$proj_root/nerfactor/geometry_from_nerf_run.sh" "$gpus" --data_root="$data_root" --trained_nerf="$trained_nerf" --out_root="$out_root" --imh="$imh" --scene_bbox="$scene_bbox" --occu_thres="$occu_thres" --mlp_chunk="$mlp_chunk"
xiumingzhang commented 1 year ago

@wangmingyang4 Yes, I think so.

xiumingzhang commented 1 year ago

@Osavalon Looks like your arr is all-zero. You can try looking into how that happened? Sounds like this is another issue, so let me close this one. Please feel free to reopen this (or create a new one if this is a separate issue).