Closed conby closed 1 year ago
I am sorry, but 6GB is not enough to run the original configuration. Please try ‘ns-train tetra-nerf’ instead and reduce number of rays per batch.
Thanks for your response, with tetra-nerf and reduce number of rays per batch to 32, ns-train still OOM (always on 0.66% of train progress)
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 ns-train tetra-nerf --pipeline.datamanager.train-num-rays-per-batch 32 --pipeline.datamanager.eval-num-rays-per-batch 32 --pipeline.model.tetrahedra-path data/blender/chair/pointnerf-0.5.th blender-data --data data/blender/chair
RuntimeError: CUDA call (cudaMalloc( reinterpret_cast<void *>(&triangle_hit_distances), sizeof(float) max_ray_triangles * num_rays ) ) failed with error: 'out of memory' (/home/ubuntu/tetra-nerf/src/tetrahedra_tracer.cpp:404)
wandb: Waiting for W&B process to finish... (failed 1). wandb: wandb: Run history: wandb: ETA (time) █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: Eval Loss ▁ wandb: Eval Loss Dict/rgb_loss ▁ wandb: GPU Memory (MB) ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: Train Iter (time) █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: Train Loss █▃▂▂▅▂▇▂▃▂▃▃▁▂▁▂▃▂▄▃▄▂▁▂▁▃▃▂▁▂▂▂▂▁▁▂▂▂▂▁ wandb: Train Loss Dict/rgb_loss █▃▂▂▅▂▇▂▃▂▃▃▁▂▁▂▃▂▄▃▄▂▁▂▁▃▃▂▁▂▂▂▂▁▁▂▂▂▂▁ wandb: Train Rays / Sec ▅▃▅▄▃▃▄▃▄█▅▇▅▃▆▃▁▁▃▆▅▃▅▄▅▃▆▃▂▃▄▃▄▃▅▄▆█▄▃ wandb: learning_rate/fields ███▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁ wandb: wandb: Run summary: wandb: ETA (time) 9118.6457 wandb: Eval Loss 0.00899 wandb: Eval Loss Dict/rgb_loss 0.00899 wandb: GPU Memory (MB) 408.64062 wandb: Train Iter (time) 0.0306 wandb: Train Loss 0.00505 wandb: Train Loss Dict/rgb_loss 0.00505 wandb: Train Rays / Sec 1048.40654 wandb: learning_rate/fields 0.00098
I recommend limiting max visited triangles
Does nerfacto train on your data without problems?
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 ns-train tetra-nerf --pipeline.datamanager.train-num-rays-per-batch 32 --pipeline.datamanager.eval-num-rays-per-batch 32 --pipeline.model.max-intersected-triangles 256 --pipeline.model.eval-num-rays-per-chunk 32 --pipeline.model.num-samples 32 --pipeline.model.num-fine-samples 32 --pipeline.model.tetrahedra-path data/blender/chair/pointnerf-0.5.th blender-data --data data/blender/chair
JAX not installed, skipping Mip-NeRF SSIM
──────────────────────────────────────────────────────── Config ────────────────────────────────────────────────────────
TrainerConfig(
_target=<class 'nerfstudio.engine.trainer.Trainer'>,
output_dir=PosixPath('outputs'),
method_name='tetra-nerf',
experiment_name=None,
project_name='nerfstudio-project',
timestamp='2023-06-03_073127',
machine=MachineConfig(seed=42, num_gpus=1, num_machines=1, machine_rank=0, dist_url='auto'),
logging=LoggingConfig(
relative_log_dir=PosixPath('.'),
steps_per_log=10,
max_buffer_size=20,
local_writer=LocalWriterConfig(
_target=<class 'nerfstudio.utils.writer.LocalWriter'>,
enable=True,
stats_to_track=(
<EventName.ITER_TRAIN_TIME: 'Train Iter (time)'>,
<EventName.TRAIN_RAYS_PER_SEC: 'Train Rays / Sec'>,
<EventName.CURR_TEST_PSNR: 'Test PSNR'>,
<EventName.VIS_RAYS_PER_SEC: 'Vis Rays / Sec'>,
<EventName.TEST_RAYS_PER_SEC: 'Test Rays / Sec'>,
<EventName.ETA: 'ETA (time)'>
),
max_log_size=10
),
profiler='basic'
),
viewer=ViewerConfig(
relative_log_filename='viewer_log_filename.txt',
websocket_port=None,
websocket_port_default=7007,
websocket_host='0.0.0.0',
num_rays_per_chunk=32768,
max_num_display_images=512,
quit_on_train_completion=False,
image_format='jpeg',
jpeg_quality=90
),
pipeline=VanillaPipelineConfig(
_target=<class 'tetranerf.nerfstudio.pipeline.TetrahedraNerfPipeline'>,
datamanager=VanillaDataManagerConfig(
_target=<class 'nerfstudio.data.datamanagers.base_datamanager.VanillaDataManager'>,
data=None,
camera_optimizer=CameraOptimizerConfig(
_target=<class 'nerfstudio.cameras.camera_optimizers.CameraOptimizer'>,
mode='off',
position_noise_std=0.0,
orientation_noise_std=0.0,
optimizer=AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.0006,
eps=1e-15,
max_norm=None,
weight_decay=0
),
scheduler=ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=None,
warmup_steps=0,
max_steps=10000,
ramp='cosine'
),
param_group='camera_opt'
),
dataparser=BlenderDataParserConfig(
_target=<class 'nerfstudio.data.dataparsers.blender_dataparser.Blender'>,
data=PosixPath('data/blender/chair'),
scale_factor=1.0,
alpha_color='white'
),
train_num_rays_per_batch=32,
train_num_images_to_sample_from=-1,
train_num_times_to_repeat_images=-1,
eval_num_rays_per_batch=32,
eval_num_images_to_sample_from=-1,
eval_num_times_to_repeat_images=-1,
eval_image_indices=(0,),
collate_fn=<function nerfstudio_collate at 0x7f577c54a7a0>,
camera_res_scale_factor=1.0,
patch_size=1
),
model=TetrahedraNerfConfig(
_target=<class 'tetranerf.nerfstudio.model.TetrahedraNerf'>,
enable_collider=True,
collider_params={'near_plane': 2.0, 'far_plane': 6.0},
loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0},
eval_num_rays_per_chunk=32,
tetrahedra_path=PosixPath('data/blender/chair/pointnerf-0.5.th'),
num_tetrahedra_vertices=174525,
num_tetrahedra_cells=1087011,
max_intersected_triangles=256,
num_samples=32,
num_fine_samples=32,
use_biased_sampler=True,
field_dim=64,
num_color_layers=1,
num_density_layers=3,
hidden_size=128,
input_fourier_frequencies=0,
initialize_colors=True,
use_gradient_scaling=True
)
),
optimizers={
'fields': {
'optimizer': RAdamOptimizerConfig(
_target=<class 'torch.optim.radam.RAdam'>,
lr=0.001,
eps=1e-08,
max_norm=None,
weight_decay=0
),
'scheduler': ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=0.0001,
warmup_steps=0,
max_steps=300000,
ramp='cosine'
)
}
},
vis='wandb',
data=None,
relative_model_dir=PosixPath('nerfstudio_models'),
steps_per_save=25000,
steps_per_eval_batch=1000,
steps_per_eval_image=2000,
steps_per_eval_all_images=50000,
max_num_iterations=300000,
mixed_precision=False,
use_grad_scaler=False,
save_only_latest_checkpoint=True,
load_dir=None,
load_step=None,
load_config=None,
load_checkpoint=None,
log_gradients=False
)
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[07:31:27] Saving config to: outputs/unnamed/tetra-nerf/2023-06-03_073127/config.yml experiment_config.py:128
Saving checkpoints to: outputs/unnamed/tetra-nerf/2023-06-03_073127/nerfstudio_models trainer.py:136
Setting up training dataset...
Caching all 100 images.
Setting up evaluation dataset...
Caching all 100 images.
No Nerfstudio checkpoint to load, so training from scratch.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.15.3
wandb: W&B syncing is set to offline
in this directory.
wandb: Run wandb online
or set WANDB_MODE=online to enable cloud syncing.
logging events to: outputs/unnamed/tetra-nerf/2023-06-03_073127
Tetrahedra initialized from file data/blender/chair/pointnerf-0.5.th:
Num points: 174525
Num tetrahedra: 1087011
[ 4][ KNOBS]: All knobs on default.
[ 4][ DISK CACHE]: Opened database: "/var/tmp/OptixCache_ubuntu/optix7cache.db" [ 4][ DISK CACHE]: Cache data size: "30.2 KiB" [ 4][ DISKCACHE]: Cache hit for key: ptx-14549-keyefbf26c79f6345943421c125989da67a-sm_75-rtc0-drv525.105.17 [ 4][COMPILE FEEDBACK]: [ 4][COMPILE FEEDBACK]: Info: Pipeline has 1 module(s), 4 entry function(s), 1 trace call(s), 0 continuation callable call(s), 0 direct callable call(s), 59 basic block(s) in entry functions, 543 instruction(s) in entry functions, 8 non-entry function(s), 63 basic block(s) in non-entry functions, 811 instruction(s) in non-entry functions, no debug information
3710 (1.24%) 30.652 ms 2 h, 31 m, 21 s 1.05 K
3720 (1.24%) 31.427 ms 2 h, 35 m, 11 s 1.02 K
3730 (1.24%) 31.215 ms 2 h, 34 m, 7 s 1.03 K
3740 (1.25%) 31.523 ms 2 h, 35 m, 39 s 1.02 K
3750 (1.25%) 31.349 ms 2 h, 34 m, 47 s 1.02 K
3760 (1.25%) 30.495 ms 2 h, 30 m, 33 s 1.05 K
3770 (1.26%) 30.634 ms 2 h, 31 m, 14 s 1.05 K
3780 (1.26%) 31.124 ms 2 h, 33 m, 39 s 1.03 K
3790 (1.26%) 31.544 ms 2 h, 35 m, 43 s 1.02 K
3800 (1.27%) 31.634 ms 2 h, 36 m, 9 s 1.01 K
Printing profiling stats, from longest to shortest duration in seconds
VanillaPipeline.get_eval_image_metrics_and_images: 117.4870
Trainer.eval_iteration: 0.0309
Trainer.train_iteration: 0.0300
VanillaPipeline.get_eval_loss_dict: 0.0243
VanillaPipeline.get_train_loss_dict: 0.0231
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/ns-train", line 8, in
Seems like the num rays per batch was too small and no ray intersected the tetrahedra-> no gradient was computed.
set nums_rays_per_batch to 128, still trainning...
15900 (5.30%) 31.378 ms 2 h, 28 m, 34 s 4.09 K 15910 (5.30%) 29.986 ms 2 h, 21 m, 58 s 4.28 K 15920 (5.31%) 30.333 ms 2 h, 23 m, 37 s 4.24 K 15930 (5.31%) 31.982 ms 2 h, 31 m, 25 s 4.01 K 15940 (5.31%) 32.258 ms 2 h, 32 m, 43 s 3.97 K 15950 (5.32%) 32.041 ms 2 h, 31 m, 41 s 4.01 K 15960 (5.32%) 31.386 ms 2 h, 28 m, 34 s 4.09 K 15970 (5.32%) 31.485 ms 2 h, 29 m, 2 s 4.08 K 15980 (5.33%) 32.068 ms 2 h, 31 m, 47 s 4.00 K 15990 (5.33%) 32.508 ms 2 h, 33 m, 52 s 3.95 K
128 seems quite low. Would you be able to use at least 1024?
128 seems quite low. Would you be able to use at least 1024?
Yes, 1024 is still going well
You can try increasing until you hit oom.
Hello, we encountered this OOM situation with a NVIDIA 6G GPU, is there any solution to run with low CUDA memory? PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 seems it didn't work here
Any comments will be appreciated.
-------------------------------------log---------------------------------------------- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 ns-train tetra-nerf-original --pipeline.model.tetrahedra-path data/blender/chair/pointnerf-0.5.th blender-data --data data/blender/chair JAX not installed, skipping Mip-NeRF SSIM ──────────────────────────────────────────────────────── Config ──────────────────────────────────────────────────────── TrainerConfig( _target=<class 'nerfstudio.engine.trainer.Trainer'>, output_dir=PosixPath('outputs'), method_name='tetra-nerf-original', experiment_name=None, project_name='nerfstudio-project', timestamp='2023-06-03_022954', machine=MachineConfig(seed=42, num_gpus=1, num_machines=1, machine_rank=0, dist_url='auto'), logging=LoggingConfig( relative_log_dir=PosixPath('.'), steps_per_log=10, max_buffer_size=20, local_writer=LocalWriterConfig( _target=<class 'nerfstudio.utils.writer.LocalWriter'>, enable=True, stats_to_track=( <EventName.ITER_TRAIN_TIME: 'Train Iter (time)'>, <EventName.TRAIN_RAYS_PER_SEC: 'Train Rays / Sec'>, <EventName.CURR_TEST_PSNR: 'Test PSNR'>, <EventName.VIS_RAYS_PER_SEC: 'Vis Rays / Sec'>, <EventName.TEST_RAYS_PER_SEC: 'Test Rays / Sec'>, <EventName.ETA: 'ETA (time)'> ), max_log_size=10 ), profiler='basic' ), viewer=ViewerConfig( relative_log_filename='viewer_log_filename.txt', websocket_port=None, websocket_port_default=7007, websocket_host='0.0.0.0', num_rays_per_chunk=32768, max_num_display_images=512, quit_on_train_completion=False, image_format='jpeg', jpeg_quality=90 ), pipeline=VanillaPipelineConfig( _target=<class 'tetranerf.nerfstudio.pipeline.TetrahedraNerfPipeline'>, datamanager=VanillaDataManagerConfig( _target=<class 'nerfstudio.data.datamanagers.base_datamanager.VanillaDataManager'>, data=None, camera_optimizer=CameraOptimizerConfig( _target=<class 'nerfstudio.cameras.camera_optimizers.CameraOptimizer'>, mode='off', position_noise_std=0.0, orientation_noise_std=0.0, optimizer=AdamOptimizerConfig( _target=<class 'torch.optim.adam.Adam'>, lr=0.0006, eps=1e-15, max_norm=None, weight_decay=0 ), scheduler=ExponentialDecaySchedulerConfig( _target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>, lr_pre_warmup=1e-08, lr_final=None, warmup_steps=0, max_steps=10000, ramp='cosine' ), param_group='camera_opt' ), dataparser=BlenderDataParserConfig( _target=<class 'nerfstudio.data.dataparsers.blender_dataparser.Blender'>, data=PosixPath('data/blender/chair'), scale_factor=1.0, alpha_color='white' ), train_num_rays_per_batch=4096, train_num_images_to_sample_from=-1, train_num_times_to_repeat_images=-1, eval_num_rays_per_batch=4096, eval_num_images_to_sample_from=-1, eval_num_times_to_repeat_images=-1, eval_image_indices=(0,), collate_fn=<function nerfstudio_collate at 0x7f3fe99b27a0>, camera_res_scale_factor=1.0, patch_size=1 ), model=TetrahedraNerfConfig( _target=<class 'tetranerf.nerfstudio.model.TetrahedraNerf'>, enable_collider=True, collider_params={'near_plane': 2.0, 'far_plane': 6.0}, loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0}, eval_num_rays_per_chunk=4096, tetrahedra_path=PosixPath('data/blender/chair/pointnerf-0.5.th'), num_tetrahedra_vertices=174525, num_tetrahedra_cells=1087011, max_intersected_triangles=512, num_samples=256, num_fine_samples=256, use_biased_sampler=False, field_dim=64, num_color_layers=1, num_density_layers=3, hidden_size=128, input_fourier_frequencies=0, initialize_colors=True, use_gradient_scaling=False ) ), optimizers={ 'fields': { 'optimizer': RAdamOptimizerConfig( _target=<class 'torch.optim.radam.RAdam'>, lr=0.001, eps=1e-08, max_norm=None, weight_decay=0 ), 'scheduler': ExponentialDecaySchedulerConfig( _target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>, lr_pre_warmup=1e-08, lr_final=0.0001, warmup_steps=0, max_steps=300000, ramp='cosine' ) } }, vis='wandb', data=None, relative_model_dir=PosixPath('nerfstudio_models'), steps_per_save=25000, steps_per_eval_batch=1000, steps_per_eval_image=2000, steps_per_eval_all_images=50000, max_num_iterations=300000, mixed_precision=False, use_grad_scaler=False, save_only_latest_checkpoint=True, load_dir=None, load_step=None, load_config=None, load_checkpoint=None, log_gradients=False ) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── [02:29:54] Saving config to: outputs/unnamed/tetra-nerf-original/2023-06-03_022954/config.yml experiment_config.py:128 Saving checkpoints to: outputs/unnamed/tetra-nerf-original/2023-06-03_022954/nerfstudio_models trainer.py:136 Setting up training dataset... Caching all 100 images. Setting up evaluation dataset... Caching all 100 images. No Nerfstudio checkpoint to load, so training from scratch. wandb: (1) Create a W&B account wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results wandb: Enter your choice: 3 wandb: You chose "Don't visualize my results" wandb: Tracking run with wandb version 0.15.3 wandb: W&B syncing is set to
offline
in this directory. wandb: Runwandb online
or set WANDB_MODE=online to enable cloud syncing. logging events to: outputs/unnamed/tetra-nerf-original/2023-06-03_022954 Tetrahedra initialized from file data/blender/chair/pointnerf-0.5.th: Num points: 174525 Num tetrahedra: 1087011 [ 4][ KNOBS]: All knobs on default.[ 4][ DISK CACHE]: Opened database: "/var/tmp/OptixCache_ubuntu/optix7cache.db" [ 4][ DISK CACHE]: Cache data size: "30.2 KiB" [ 4][ DISKCACHE]: Cache hit for key: ptx-14549-keyefbf26c79f6345943421c125989da67a-sm_75-rtc0-drv525.105.17 [ 4][COMPILE FEEDBACK]: [ 4][COMPILE FEEDBACK]: Info: Pipeline has 1 module(s), 4 entry function(s), 1 trace call(s), 0 continuation callable call(s), 0 direct callable call(s), 59 basic block(s) in entry functions, 543 instruction(s) in entry functions, 8 non-entry function(s), 63 basic block(s) in non-entry functions, 811 instruction(s) in non-entry functions, no debug information
[02:30:03] Printing max of 10 lines. Set flag --logging.local-writer.max-log-size=0 to disable line writer.py:408 wrapping. Step (% Done) Train Iter (time) ETA (time)
0 (0.00%) 1 s, 217.954 ms 4 d, 5 h, 29 m, 46 s Printing profiling stats, from longest to shortest duration in seconds Trainer.train_iteration: 0.4076 VanillaPipeline.get_train_loss_dict: 0.2806 Trainer.eval_iteration: 0.0000 Traceback (most recent call last): File "/home/ubuntu/.local/bin/ns-train", line 8, in
sys.exit(entrypoint())
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/scripts/train.py", line 260, in entrypoint
main(
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/scripts/train.py", line 246, in main
launch(
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/scripts/train.py", line 185, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/scripts/train.py", line 100, in train_loop
trainer.train()
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/engine/trainer.py", line 240, in train
loss, loss_dict, metrics_dict = self.train_iteration(step)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/utils/profiler.py", line 127, in inner
out = func(*args, kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/engine/trainer.py", line 446, in trainiteration
, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/utils/profiler.py", line 127, in inner
out = func(*args, *kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/pipelines/base_pipeline.py", line 276, in get_train_loss_dict
model_outputs = self._model(ray_bundle) # train distributed data parallel model if world_size > 1
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/models/base_model.py", line 140, in forward
return self.get_outputs(ray_bundle)
File "/home/ubuntu/tetra-nerf/tetranerf/nerfstudio/model.py", line 440, in get_outputs
tracer_output = tracer.trace_rays(
RuntimeError: CUDA call (cudaMalloc( reinterpret_cast<void *>(&triangle_hit_distances), sizeof(float) max_ray_triangles * num_rays ) ) failed with error: 'out of memory' (/home/ubuntu/tetra-nerf/src/tetrahedra_tracer.cpp:404)
Traceback (most recent call last): File "/home/ubuntu/.local/bin/ns-train", line 8, in
sys.exit(entrypoint())
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/scripts/train.py", line 260, in entrypoint
main(
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/scripts/train.py", line 246, in main
launch(
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/scripts/train.py", line 185, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/scripts/train.py", line 100, in train_loop
trainer.train()
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/engine/trainer.py", line 240, in train
loss, loss_dict, metrics_dict = self.train_iteration(step)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/utils/profiler.py", line 127, in inner
out = func(*args, kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/engine/trainer.py", line 446, in trainiteration
, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/utils/profiler.py", line 127, in inner
out = func(*args, *kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/pipelines/base_pipeline.py", line 276, in get_train_loss_dict
model_outputs = self._model(ray_bundle) # train distributed data parallel model if world_size > 1
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/nerfstudio/models/base_model.py", line 140, in forward
return self.get_outputs(ray_bundle)
File "/home/ubuntu/tetra-nerf/tetranerf/nerfstudio/model.py", line 440, in get_outputs
tracer_output = tracer.trace_rays(
RuntimeError: CUDA call (cudaMalloc( reinterpret_cast<void *>(&triangle_hit_distances), sizeof(float) max_ray_triangles * num_rays ) ) failed with error: 'out of memory' (/home/ubuntu/tetra-nerf/src/tetrahedra_tracer.cpp:404)
wandb: Waiting for W&B process to finish... (failed 1). wandb: wandb: Run history: wandb: ETA (time) █▄▃▂▁▁ wandb: GPU Memory (MB) ▁ wandb: Train Iter (time) █▄▃▂▁▁ wandb: Train Loss ▁ wandb: Train Loss Dict/rgb_loss ▁ wandb: Train Rays / Sec ▁██▆ wandb: learning_rate/fields █▇▅▄▂▁ wandb: wandb: Run summary: wandb: ETA (time) 142501.25605 wandb: GPU Memory (MB) 2860.13672 wandb: Train Iter (time) 0.47501 wandb: Train Loss 0.01844 wandb: Train Loss Dict/rgb_loss 0.01844 wandb: Train Rays / Sec 13076.66933 wandb: learning_rate/fields 0.001 wandb: wandb: You can sync this run to the cloud by running: wandb: wandb sync outputs/unnamed/tetra-nerf-original/2023-06-03_022954/wandb/offline-run-20230603_023001-79ywy4ew wandb: Find logs at: outputs/unnamed/tetra-nerf-original/2023-06-03_022954/wandb/offline-run-20230603_023001-79ywy4ew/logs