zju3dv / NeuralRecon-W

Code for "Neural 3D Reconstruction in the Wild", SIGGRAPH 2022 (Conference Proceedings)
Apache License 2.0
694 stars 32 forks source link

Octree_update Error When Training With My Own Data! #40

Open chufall opened 1 year ago

chufall commented 1 year ago

Hi, thank you for your excellent work!

I have trained my own dataset which is the "buckingham_palace" downloaded from Image Matching 2020 (https://www.cs.ubc.ca/~kmyi/imw2020/data.html) .

I have pre-process the data with the script, and then I run the train.py with 1 GPU on 1 node. When the 10000 iteration is finished , the second updating step is error , and the first updating step is ok Here is the right and error log:

Epoch 0: 2%|▏ | 4999/210689 [27:57<19:10:01, 2.98it/s, loss=1.21, train/color_loss=0.953, train/normal_loss=0.00272, train/mask_error=0.245, train/psnr=7.180] Updating sdf to octree.... train dim: 256, upsampled 8 times, original dim 32, sparse num 8102

0%| | 0/64 [00:00<?, ?it/s]

23%|██▎ | 15/64 [00:00<00:00, 142.06it/s]

47%|████▋ | 30/64 [00:00<00:00, 138.66it/s]

70%|███████ | 45/64 [00:00<00:00, 141.12it/s]

94%|█████████▍| 60/64 [00:00<00:00, 137.97it/s] 100%|██████████| 64/64 [00:00<00:00, 140.45it/s] 2023-02-21 18:18:00.083 | DEBUG | tools.prepare_data.generate_voxel:gen_octree:124 - number of points for voxel generation: 288/288 2023-02-21 18:18:00.083 | DEBUG | tools.prepare_data.generate_voxel:gen_octree:147 - level: 8 for expected voxel size: 0.025204115904286808 sdf filtered points 288, max sdf: -0.0035191774368286133, min sdf: 0.43912580609321594 Update successful!!

.......

Epoch 0: 5%|▍ | 9999/210689 [57:00<19:03:58, 2.92it/s, loss=1.2, train/color_loss=0.931, train/normal_loss=0.00244, train/mask_error=0.247, train/psnr=7.360] Updating sdf to octree.... train dim: 256, upsampled 8 times, original dim 32, sparse num 8102

0%| | 0/64 [00:00<?, ?it/s]

23%|██▎ | 15/64 [00:00<00:00, 142.03it/s]

47%|████▋ | 30/64 [00:00<00:00, 142.90it/s]

70%|███████ | 45/64 [00:00<00:00, 139.74it/s]

94%|█████████▍| 60/64 [00:00<00:00, 141.30it/s] 100%|██████████| 64/64 [00:00<00:00, 142.75it/s] 2023-02-21 18:47:02.862 | DEBUG | tools.prepare_data.generate_voxel:gen_octree:124 - number of points for voxel generation: 0/0 2023-02-21 18:47:02.863 | DEBUG | tools.prepare_data.generate_voxel:gen_octree:147 - level: 8 for expected voxel size: 0.025204115904286808 sdf filtered points 0, max sdf: 0.016315370798110962, min sdf: 0.3999830186367035 Traceback (most recent call last): File "/root/dev/NeuralRecon-W/train.py", line 72, in main(hparams, config) File "/root/dev/NeuralRecon-W/train.py", line 64, in main trainer.fit(system, datamodule=data_module) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit self._run(model) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 917, in _run self._dispatch() File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 985, in _dispatch self.accelerator.start_training(self) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training self.training_type_plugin.start_training(trainer) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training self._results = trainer.run_stage() File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 995, in run_stage return self._run_train() File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1044, in _run_train self.fit_loop.run() File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance epoch_output = self.epoch_loop.run(train_dataloader) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, *kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 130, in advance batch_output = self.batch_loop.run(batch, self.iteration_count, self._dataloader_idx) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 100, in run super().run(batch, batch_idx, dataloader_idx) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(args, kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 147, in advance result = self._run_optimization(batch_idx, split_batch, opt_idx, optimizer) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 201, in _run_optimization self._optimizer_step(optimizer, opt_idx, batch_idx, closure) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 395, in _optimizer_step model_ref.optimizer_step( File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1616, in optimizer_step optimizer.step(closure=optimizer_closure) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 206, in step self.optimizer_step(closure=closure, profiler_name=profiler_name, **kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 128, in optimizer_step trainer.accelerator.optimizer_step(self._optimizer, self._optimizer_idx, lambda_closure=closure, kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 296, in optimizer_step self.run_optimizer_step(optimizer, opt_idx, lambda_closure, kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 303, in run_optimizer_step self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 226, in optimizer_step optimizer.step(closure=lambda_closure, kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(*args, *kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/torch/optim/adam.py", line 100, in step loss = closure() File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 235, in _training_step_and_backward_closure result = self.training_step_and_backward(split_batch, batch_idx, opt_idx, optimizer, hiddens) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 536, in training_step_and_backward result = self._training_step(split_batch, batch_idx, opt_idx, hiddens) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 306, in _training_step training_step_output = self.trainer.accelerator.training_step(step_kwargs) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 193, in training_step return self.training_type_plugin.training_step(step_kwargs.values()) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 172, in training_step return self.model.training_step(args, kwargs) File "/root/dev/NeuralRecon-W/lightning_modules/neuconw_system.py", line 372, in training_step self.octree_update( File "/root/dev/NeuralRecon-W/lightning_modules/neuconw_system.py", line 281, in octree_update octree_new, scene_origin, scale, level = gen_octree( File "/root/dev/NeuralRecon-W/tools/prepare_data/generate_voxel.py", line 150, in gen_octree octree = spc.unbatched_points_to_octree(quantized_pc, level) File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/kaolin/ops/spc/points.py", line 75, in unbatched_points_to_octree morton = torch.sort(points_to_morton(unique).contiguous())[0] File "/root/anaconda3/envs/nerf/lib/python3.9/site-packages/kaolin/ops/spc/points.py", line 105, in points_to_morton return _C.ops.spc.points_to_morton_cuda(points.contiguous()).reshape(*shape) TypeError: reshape() missing 1 required positional arguments: "shape"

So , how to fix it?
I have used the default train.yaml and the depth_pecent is 0.0

Thanks a lot!

qc

Burningdust21 commented 1 year ago

Hi, there is no valid voxel in octree_update, this indicate the reconstruction is failed. My suggestion is to check your scene configuration is correct, especially did you scene origin and radius cover field of interest.