nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.51k stars 1.3k forks source link

is there any plan to support pytorch 2.0+ #1613

Closed ksnzh closed 1 year ago

ksnzh commented 1 year ago

After modifying code blow, nerfstudio can run with pytroch 2.0.

--- a/nerfstudio/data/utils/nerfstudio_collate.py
+++ b/nerfstudio/data/utils/nerfstudio_collate.py
@@ -23,8 +23,10 @@ from typing import Callable, Dict, Union

 import torch
 import torch.utils.data
-from torch._six import string_classes
-
+try:
+    from torch._six import string_classes
+except:
+    string_classes = (str, bytes)
 from nerfstudio.cameras.cameras import Cameras

But use default fp16 training, it will get error.

ns-train nerfacto --data data/poster --vis wandb
Traceback (most recent call last):
  File "/home/user/.local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/3dvision/nerfstudio/scripts/train.py", line 247, in entrypoint
    main(
  File "/workspace/3dvision/nerfstudio/scripts/train.py", line 233, in main
    launch(
  File "/workspace/3dvision/nerfstudio/scripts/train.py", line 172, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/3dvision/nerfstudio/scripts/train.py", line 87, in train_loop
    trainer.train()
  File "/workspace/3dvision/nerfstudio/nerfstudio/engine/trainer.py", line 218, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/workspace/3dvision/nerfstudio/nerfstudio/utils/profiler.py", line 43, in wrapper
    ret = func(*args, **kwargs)
  File "/workspace/3dvision/nerfstudio/nerfstudio/engine/trainer.py", line 395, in train_iteration
    self.optimizers.optimizer_scaler_step_all(self.grad_scaler)
  File "/workspace/3dvision/nerfstudio/nerfstudio/engine/optimizers.py", line 130, in optimizer_scaler_step_all
    grad_scaler.step(optimizer)
  File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 368, in step
    assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.

When i changged to use fp32 training, the result is not stable.

ns-train nerfacto --data data/poster --vis wandb --mixed-precision False
屏幕截图 2023-03-16 112247
tancik commented 1 year ago

Pytorch seems cool, but it is unclear how it will interact with other libraries like tinycudann and nerfacc. We will probably let the field settle a little before adding support.

ksnzh commented 1 year ago

@tancik nerfacc@882d992(ahead of v0.3.5) works fine with pytorch nightly and cuda11.8

❯ python examples/train_ngp_nerf.py --train_split train --scene lego
elapsed_time=1.29s | step=0 | loss=0.07298 | alive_ray_mask=256 | n_rendering_samples=68299 | num_rays=256 |
elapsed_time=90.28s | step=10000 | loss=0.00059 | alive_ray_mask=15833 | n_rendering_samples=261834 | num_rays=47332 |
elapsed_time=186.81s | step=20000 | loss=0.00037 | alive_ray_mask=16590 | n_rendering_samples=261559 | num_rays=50510 |
100%|████████████████████████████████████████████████████████████████████████████████| 200/200 [00:52<00:00,  3.78it/s]
evaluation: psnr_avg=35.57740461349487
training stops

here is my python environment

torch                    2.0.0+cu118                                         c:\users\ksnzh\scoop\persist\miniconda3\envs\pt\lib\site-packages                                       pip
nerfacc                  0.3.5           c:\users\ksnzh\workspace\nerfacc    c:\users\ksnzh\workspace\nerfacc

For tinycudann, I run instant-ngp with cuda12 and it seems okey.

jkulhanek commented 1 year ago

PyTorch 2.0 is now supported