Closed aiolosgs closed 1 year ago
I got an error when I train the model using
python launch.py --config configs/neus-colmap.yaml --gpu 0 --train
:nvcc fatal : Value 'c++17' is not defined for option 'std' ninja: build stopped: subcommand failed.
I solved this by upgrade my nvidia-cuda-toolkit. But now i run into another error while training
Traceback (most recent call last): | 0/2 [00:00<?, ?it/s]
File "/home/paul/projects/nerf/torch-bakedsdf/launch.py", line 131, in <module>
main()
File "/home/paul/projects/nerf/torch-bakedsdf/launch.py", line 120, in main
trainer.fit(system, datamodule=dm)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 88, in launch
return function(*args, **kwargs)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
results = self._run_stage()
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage
self._run_train()
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train
self.fit_loop.run()
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.on_advance_end()
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 250, in on_advance_end
self._run_validation()
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 308, in _run_validation
self.val_loop.run()
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 152, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 132, in advance
self._on_evaluation_batch_start(**kwargs)
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 262, in _on_evaluation_batch_start
self.trainer._call_lightning_module_hook(hook_name, *kwargs.values())
File "/home/paul/anaconda3/envs/bakedsdf/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1356, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/paul/projects/nerf/torch-bakedsdf/systems/base.py", line 61, in on_validation_batch_start
self.preprocess_data(batch, 'validation')
File "/home/paul/projects/nerf/torch-bakedsdf/systems/neus.py", line 64, in preprocess_data
rgb = self.dataset.all_images[index].view(-1, self.dataset.all_images.shape[-1]).to(self.rank)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
Sorry to bother you, this bug has been solved in the #1. You could fetch the latest code to solve it
I got an error when I train the model using
python launch.py --config configs/neus-colmap.yaml --gpu 0 --train
: