hugoycj / Instant-angelo

Instant-angelo: Build high-fidelity Digital Twin within 20 Minutes!
MIT License
409 stars 26 forks source link

Windows, building tinycudann fails. #17

Open antithing opened 10 months ago

antithing commented 10 months ago

I know Windows is not supported, but... I am trying to build tinycudann (from source) and I get file not found errors when building the torch binding:

python setup.py install
Building PyTorch extension for tiny-cuda-nn version 1.7
Obtained compute capability 86 from PyTorch
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
Detected CUDA version 12.1
Targeting C++ standard 17
running install
C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing tinycudann.egg-info\PKG-INFO
writing dependency_links to tinycudann.egg-info\dependency_links.txt
writing top-level names to tinycudann.egg-info\top_level.txt
reading manifest file 'tinycudann.egg-info\SOURCES.txt'
writing manifest file 'tinycudann.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
building 'tinycudann_bindings._86_C' extension
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -ID:\NERF\Instant-angelo-main\third_party\tiny-cuda-nn/include -ID:\NERF\Instant-angelo-main\third_party\tiny-cuda-nn/dependencies -ID:\NERF\Instant-angelo-main\third_party\tiny-cuda-nn/dependencies/cutlass/include -ID:\NERF\Instant-angelo-main\third_party\tiny-cuda-nn/dependencies/cutlass/tools/util/include -ID:\NERF\Instant-angelo-main\third_party\tiny-cuda-nn/dependencies/fmt/include -IC:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\include -IC:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\include\TH -IC:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IC:\Users\B\AppData\Local\Programs\Python\Python39\include -IC:\Users\B\AppData\Local\Programs\Python\Python39\Include /EHsc /Tp../../dependencies/fmt/src/format.cc /Fobuild\temp.win-amd64-cpython-39\Release\../../dependencies/fmt/src/format.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -DTCNN_PARAMS_UNALIGNED -DTCNN_NO_NETWORKS -DTCNN_MIN_GPU_ARCH=86 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_86_C -D_GLIBCXX_USE_CXX11_ABI=0
format.cc
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm(9): fatal error C1083: Cannot open include file: 'yvals_core.h': No such file or directory
error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX64\\x64\\cl.exe' failed with exit code 2

If anyone has any thoughts, it would be much appreciated!

Thanks

kotaxyz commented 10 months ago

activate the conda environment then write these commands and make sure you change the microsoft visual studio version with the one you have ,you can search for the vcvars32.bat and change the directory i provided to the matching one and add x64 in the end like this

"\Program Files\Microsoft Visual Studio\2022\community\vc\Auxiliary\Build\vcvars32.bat" x64 after that write this command pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

antithing commented 10 months ago

Thank you! Works perfectly. :)

antithing commented 10 months ago

Now I have training starting, but I crash with a memory error:

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
fatal: not a git repository (or any of the parent directories): .git
D:\NERF\Instant-angelo-main\utils\callbacks.py:76: UserWarning: Code snapshot is not saved. Please make sure you have git installed and are in a git repository.
  rank_zero_warn("Code snapshot is not saved. Please make sure you have git installed and are in a git repository.")

  | Name  | Type      | Params
------------------------------------
0 | model | NeuSModel | 28.0 M
------------------------------------
28.0 M    Trainable params
0         Non-trainable params
28.0 M    Total params
55.936    Total estimated model params size (MB)
Epoch 0: : 0it [00:00, ?it/s]Update finite_difference_eps to 0.040807057654505825
Epoch 0: : 499it [02:13,  3.72it/s, loss=0.161, train/inv_s=29.30, train/num_rays=284.0]C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\optim\lr_scheduler.py:149: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
  warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
Epoch 0: : 5000it [09:16,  8.9Traceback (most recent call last):10, train/num_rays=698.0]
  File "<string>", line 1, in <module>
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\__init__.py", line 1504, in <module>
    from . import masked
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\masked\__init__.py", line 3, in <module>
    from ._ops import (
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\masked\_ops.py", line 11, in <module>
    from torch._prims_common import corresponding_real_dtype
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_prims_common\__init__.py", line 23, in <module>
    import sympy
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\__init__.py", line 73, in <module>
    from .polys import (Poly, PurePoly, poly_from_expr, parallel_poly_from_expr,
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\polys\__init__.py", line 75, in <module>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from .polyfuncs import (symmetrize, horner, interpolate,
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\polys\polyfuncs.py", line 11, in <module>
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
    from sympy.polys.specialpolys import (
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\__init__.py", line 1750, in <module>
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\polys\specialpolys.py", line 297, in <module>
    from . import _meta_registrations
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_meta_registrations.py", line 8, in <module>
    from torch._decomp import (
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_decomp\__init__.py", line 190, in <module>
    from sympy.polys.rings import ring
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 786, in exec_module
  File "<frozen importlib._bootstrap_external>", line 881, in get_code
  File "<frozen importlib._bootstrap_external>", line 980, in get_data
MemoryError
    import torch._decomp.decompositions
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_decomp\decompositions.py", line 10, in <module>
    import torch._prims as prims
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_prims\__init__.py", line 2968, in <module>
    register_debug_prims()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_prims\debug_prims.py", line 41, in register_debug_prims
    def load_tensor_factory(name, size, stride, dtype, device):
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_custom_op\impl.py", line 330, in inner
    self._register_impl("factory", f)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_custom_op\impl.py", line 221, in _register_impl
    frame = inspect.getframeinfo(sys._getframe(stacklevel))
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\inspect.py", line 1503, in getframeinfo
    lines, lnum = findsource(frame)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\inspect.py", line 829, in findsource
    module = getmodule(object, file)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\inspect.py", line 754, in getmodule
    modulesbyfile[f] = modulesbyfile[
MemoryError
Traceback (most recent call last):
  File "<string>", line 1, in <module>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\__init__.py", line 1750, in <module>
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
    from . import _meta_registrations
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_meta_registrations.py", line 4003, in <module>
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 126, in _main
    @register_meta(aten.max_unpool2d)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_ops.py", line 775, in __getattr__
    self = reduction.pickle.load(from_parent)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\__init__.py", line 1504, in <module>
    setattr(self, op_name, opoverloadpacket)
MemoryError
    from . import masked
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\masked\__init__.py", line 3, in <module>
    from ._ops import (
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\masked\_ops.py", line 11, in <module>
    from torch._prims_common import corresponding_real_dtype
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_prims_common\__init__.py", line 23, in <module>
    import sympy
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\__init__.py", line 73, in <module>
    from .polys import (Poly, PurePoly, poly_from_expr, parallel_poly_from_expr,
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\polys\__init__.py", line 75, in <module>
    from .polyfuncs import (symmetrize, horner, interpolate,
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\polys\polyfuncs.py", line 11, in <module>
    from sympy.polys.specialpolys import (
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\polys\specialpolys.py", line 297, in <module>
    from sympy.polys.rings import ring
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\polys\rings.py", line 30, in <module>
    from sympy.printing.defaults import DefaultPrinting
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\sympy\printing\__init__.py", line 7, in <module>
    from .mathml import mathml, print_mathml
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 786, in exec_module
  File "<frozen importlib._bootstrap_external>", line 881, in get_code
  File "<frozen importlib._bootstrap_external>", line 980, in get_data
MemoryError
Traceback (most recent call last):
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 1132, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\queue.py", line 179, in get
    raise Empty
_queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\NERF\Instant-angelo-main\launch.py", line 125, in <module>
    main()
  File "D:\NERF\Instant-angelo-main\launch.py", line 114, in main
    trainer.fit(system, datamodule=dm)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1112, in _run
    results = self._run_stage()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1191, in _run_stage
    self._run_train()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\loop.py", line 200, in run
    self.on_advance_end()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 250, in on_advance_end
    self._run_validation()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 308, in _run_validation
    self.val_loop.run()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 152, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 121, in advance
    batch = next(data_fetcher)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 184, in __next__
    return self.fetching_function()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 265, in fetching_function
    self._fetch_next_batch(self.dataloader_iter)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 280, in _fetch_next_batch
    batch = next(iterator)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 630, in __next__
    data = self._next_data()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 1328, in _next_data
    idx, data = self._get_data()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 1284, in _get_data
    success, data = self._try_get_data()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 1145, in _try_get_data
    raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e
RuntimeError: DataLoader worker (pid(s) 19388, 24592, 17544, 20436) exited unexpectedly
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

Is this related to the dataloader worker size perhaps? Where can I adjust that variable?

RTX 3090, Intel i9

Thanks,

kotaxyz commented 10 months ago

i and many people get the same error and raised the same error to dev of instant-nsr-pl i guess its a problem with windows support because these were made for linux im trying to fix it ,also if you wanted to edit parameters i guess these are in configs folder , if i get it to work i will tell you also if you have got it to work comment here