nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
8.87k stars 1.18k forks source link

torch.compile -> torch_compile from misc #3202

Closed liruilong940607 closed 3 weeks ago

liruilong940607 commented 3 weeks ago

Fix the issue of torch.compile() introduced in #3200 is not supported on windows

gradeeterna commented 3 weeks ago

Hey, just updated on Windows and getting these errors with splatfacto, Nerfstudio v1.1.2 and gsplat v1.0. Any ideas?

`3 errors detected in the compilation of "C:/Users/user/anaconda3/envs/nerfstudio/lib/site-packages/gsplat/cuda/csrc/rasterization.cu". rasterization.cu ninja: build stopped: subcommand failed.

Printing profiling stats, from longest to shortest duration in seconds Trainer.train_iteration: 16.7488 VanillaPipeline.get_train_loss_dict: 16.7478 Traceback (most recent call last): File "C:\Users\user\anaconda3\envs\nerfstudio\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\user\anaconda3\envs\nerfstudio\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\user\anaconda3\envs\nerfstudio\Scripts\ns-train.exe__main__.py", line 7, in File "D:\nerfstudiogs\nerfstudio\nerfstudio\scripts\train.py", line 262, in entrypoint main( File "D:\nerfstudiogs\nerfstudio\nerfstudio\scripts\train.py", line 247, in main launch( File "D:\nerfstudiogs\nerfstudio\nerfstudio\scripts\train.py", line 189, in launch main_func(local_rank=0, world_size=world_size, config=config) File "D:\nerfstudiogs\nerfstudio\nerfstudio\scripts\train.py", line 100, in train_loop trainer.train() File "D:\nerfstudiogs\nerfstudio\nerfstudio\engine\trainer.py", line 261, in train loss, loss_dict, metrics_dict = self.train_iteration(step) File "D:\nerfstudiogs\nerfstudio\nerfstudio\utils\profiler.py", line 112, in inner out = func(*args, kwargs) File "D:\nerfstudiogs\nerfstudio\nerfstudio\engine\trainer.py", line 496, in trainiteration , loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step) File "D:\nerfstudiogs\nerfstudio\nerfstudio\utils\profiler.py", line 112, in inner out = func(*args, *kwargs) File "D:\nerfstudiogs\nerfstudio\nerfstudio\pipelines\base_pipeline.py", line 301, in get_train_loss_dict model_outputs = self._model(ray_bundle) # train distributed data parallel model if world_size > 1 File "C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\nerfstudiogs\nerfstudio\nerfstudio\models\base_model.py", line 143, in forward return self.get_outputs(ray_bundle) File "D:\nerfstudiogs\nerfstudio\nerfstudio\models\splatfacto.py", line 733, in get_outputs render, alpha, info = rasterization( File "C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\gsplat\rendering.py", line 212, in rasterization proj_results = fully_fused_projection( File "C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\gsplat\cuda_wrapper.py", line 260, in fully_fused_projection return _FullyFusedProjection.apply( File "C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\torch\autograd\function.py", line 539, in apply return super().apply(*args, *kwargs) # type: ignore[misc] File "C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\gsplat\cuda_wrapper.py", line 692, in forward radii, means2d, depths, conics, compensations = _make_lazy_cuda_func( File "C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\gsplat\cuda_wrapper.py", line 12, in call_cuda return getattr(_C, name)(args, kwargs) AttributeError: module 'gsplat_cuda' has no attribute 'fully_fused_projection_fwd' `

liruilong940607 commented 3 weeks ago

@gradeeterna It shows the compiling is failed. Any error message before this to show the compiling errors?

gradeeterna commented 3 weeks ago

@liruilong940607 Yeah there was a huge wall of errors when running ns-train splatfacto, starting with:

Traceback (most recent call last): File "C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\gsplat\cuda_backend.py", line 41, in from gsplat import csrc as _C ImportError: cannot import name 'csrc' from 'gsplat' (C:\Users\user\anaconda3\envs\nerfstudio\lib\site-packages\gsplat__init__.py)

Attached the full errors as a txt below.

errors.txt

Also tried setting up a fresh conda env and installing from scratch, tinycudann etc seemed to build fine, but get these errors when I try ns-train. Downgraded to nerfstudio v1.1.0 and gsplat v0.1.11 for now.

Thanks!

liruilong940607 commented 3 weeks ago

Thanks for sharing the error log! this is very helpful! I'm gonna look into it

Ben-Mack commented 2 weeks ago

Hi @gradeeterna , I'm having the same error as you did, but I tried to downgrade nerfstudio & gsplat but still same error

pip install nerfstudio==1.1.0 pip install gplat==0.1.11

Can you share how did you do the downgrade?

gradeeterna commented 2 weeks ago

@Ben-Mack I'm pretty sure that's all I did to downgrade a few days ago, but I was also struggling with this again yesterday.

I eventually got the latest versions of nerfstudio 1.1.2 and gsplat 1.0 to work following the updated Windows installation guide here - https://github.com/nerfstudio-project/nerfstudio/blob/main/docs/quickstart/installation.md

After the vcvarsall.bat part, I also had to run "set DISTUTILS_USE_SDK=1" to get gsplat to build successfully.

Also if installing nerfstudio from source rather than pip install, skip "pip install --upgrade pip setuptools" as that updates setuptools from v69 to v70 which wasn't working with splatfacto.