Open smallfly opened 1 year ago
Can you try from commit https://github.com/nerfstudio-project/nerfstudio/commit/d4b04376abd46d9bddf8c299d1687177fe027951 and see if that works?
Although v0.3.0 was a workaround, it was able to read the file without failure.
It should be solved now :)
I met the same error of "Wrong ckpt format" in Volinga.ai.
My environment is as follows:
I took following steps:
installed anaconda3
create virtual env with python==3.10.13
installed pip packages
python -m pip install --upgrade pip
conda install -c "nvidia/label/cuda-11.8.0" cudatoolkit=11.8
conda install -c "nvidia/label/cuda-11.8.0" cuda-nvcc
python -m pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --extra-index-url https://download.pytorch.org/whl/cu118
installed ninja and skipped tiny-cuda-nn I gave up tiny-cuda-nn due to the installation error.
installed nerfstudio
python -m pip install nerfstudio
installed volinga-model
git clone https://github.com/Volinga/volinga-model
cd volinga-model
python -m pip install -e . --user
then, I checked "volinga" is in the console output after executing ns-train -h
created volinga ckpt file
ns-train volinga --data data/nerfstudio/poster --vis viewer
, and successfully created ckpt file.
uploaded ckpt file to https://volinga.ai/main
failed uploading uploading procedure became 100%, but it failed. the error information is displayed in volinga.ai as below:
Error code: 607 Info: Wrong ckpt format
I would be appreciate if anybody could help me with the solution.
I think you may need to install a specific version of nerfstudio to train volinga? I saw this from a quick look... https://github.com/Volinga/volinga-model#1-install-nerfstudio--v032
Thank you, machenmusik.
I re-installed all the packages, and specified the version of nerfstudio==0.3.2.
However, ns-train volinga --data data/nerfstudio/poster --vis viewer
did not work properly.
I thought this was due to another reason (version mismatch? or source code problem?).
I would be grateful if you have any clue about solving this problem.
I paste error message after performing ns-train volinga --data data/nerfstudio/poster --vis viewer
below:
[NOTE] Not running eval iterations since only viewer is enabled.
Use --vis {wandb, tensorboard, viewer+wandb, viewer+tensorboard} to run with eval.
No Nerfstudio checkpoint to load, so training from scratch.
Disabled tensorboard/wandb event writers
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 4.0080
VanillaPipeline.get_train_loss_dict: 3.9890
Traceback (most recent call last):
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\Scripts\ns-train.exe\__main__.py", line 7, in <module>
sys.exit(entrypoint())
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\scripts\train.py", line 261, in entrypoint
main(
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\scripts\train.py", line 246, in main
launch(
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
trainer.train()
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\engine\trainer.py", line 255, in train
loss, loss_dict, metrics_dict = self.train_iteration(step)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\utils\profiler.py", line 127, in inner
out = func(*args, **kwargs)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\engine\trainer.py", line 468, in train_iteration
_, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\utils\profiler.py", line 127, in inner
out = func(*args, **kwargs)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\pipelines\base_pipeline.py", line 281, in get_train_loss_dict
model_outputs = self._model(ray_bundle) # train distributed data parallel model if world_size > 1
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\models\base_model.py", line 142, in forward
return self.get_outputs(ray_bundle)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\models\nerfacto.py", line 278, in get_outputs
field_outputs = self.field.forward(ray_samples, compute_normals=self.config.predict_normals)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\fields\base_field.py", line 124, in forward
density, density_embedding = self.get_density(ray_samples)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\fields\nerfacto_field.py", line 216, in get_density
h = self.mlp_base(positions_flat).view(*ray_samples.frustums.shape, -1)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\field_components\mlp.py", line 178, in forward
return self.pytorch_fwd(in_tensor)
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\nerfstudio\field_components\mlp.py", line 164, in pytorch_fwd
for i, layer in enumerate(self.layers):
File "D:\anaconda3\envs\nerfstudio032_py310_pip\lib\site-packages\torch\nn\modules\module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'MLP' object has no attribute 'layers'
Hello! @AFMagnon, If I understood correctly, you are not using tiny cuda to train your models right?
Thank you for your comment, Frivas97.
Thats right, I did not install tiny-cuda-nn.
After installing ninja
, I failed installing tiny-cuda-nn with following error message:
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Detected CUDA version 11.8
Targeting C++ standard 17
running bdist_wheel
D:\anaconda3\envs\nerfstudio_py310\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
Without this package, I succeeded to implement ns-train nerfacto --data data/nerfstudio/poster
.
So, I thought this is not necessary...
If tiny cuda is not available, NeRFStudio will use vanilla Pytorch to train the model. The problem is that the way in which the model is stored in the ckpt varies depending if you use tiny cuda or not. At this moment, we only support the tiny cuda "format" on Volinga exporter. This is probably the reason for the failure.
Thank you! To sum up, I should build the following environment, right? ・nerfstudio==0.3.2 ・manage to install tiny-cuda-nn
So, I will focus on installation of tiny-cuda-nn with nerfstudio of 0.3.2. I found the useful link for the installation. that reads:
git clone https://github.com/NVlabs/tiny-cuda-nn.git
cd tiny-cuda-nn
git submodule update --init --recursive
python -m pip install ./bindings/torch
but it failed... I'll make an effort somehow...
That's right, that is the setup you need.
There have been various periods where the latest version of tiny-cuda-nn was broken and so new installs would fail.
(@Frivas97 if it's not already there, you may want to add the tiny-cuda-nn requirement to docs and maybe even implementation of your method, as it differs from others...)
(@AFMagnon do you have CUDA libraries and Visual Studio installed? CUDA version 11.8 is known to work well, and IIRC community versions of VS may suffice... not sure whether just windows-build-tools does)
@machenmusik , thank you for your suggestion. I have tried to build environment that satisfy :
nvcc
and nvidia-smi
. They are all version 11.8.cl
command.After building the environment, I re-tried the installation:
#install pip packages
python -m pip install --upgrade pip
python -m pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --extra-index-url https://download.pytorch.org/whl/cu118
#tiny-cuda installation
git clone https://github.com/NVlabs/tiny-cuda-nn.git
cd tiny-cuda-nn
git submodule update --init --recursive
python -m pip install ./bindings/torch
But python -m pip install ./bindings/torch
failed...
The error message is as follows:
Processing c:\users\UserName\tiny-cuda-nn\bindings\torch
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
<string>:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
Traceback (most recent call last):
File "C:\Users\UserName\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
main()
File "C:\Users\UserName\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_vendor\pyproject_hooks\_in_p json_out['return_val'] = hook(**hook_input['kwargs'])
File "C:\Users\UserName\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "C:\Users\UserName\AppData\Local\Temp\pip-build-env-0c0t_z5q\overlay\Lib\site-packages\setuptools\build_meta.py", line 355, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "C:\Users\UserName\AppData\Local\Temp\pip-build-env-0c0t_z5q\overlay\Lib\site-packages\setuptools\build_meta.py", line 325, in _get_build_requires
self.run_setup()
File "C:\Users\UserName\AppData\Local\Temp\pip-build-env-0c0t_z5q\overlay\Lib\site-packages\setuptools\build_meta.py", line 507, in run_setup
super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
File "C:\Users\UserName\AppData\Local\Temp\pip-build-env-0c0t_z5q\overlay\Lib\site-packages\setuptools\build_meta.py", line 341, in run_setup
exec(code, locals())
File "<string>", line 9, in <module>
ModuleNotFoundError: No module named 'torch'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
That says "ModuleNotFoundError: No module named 'torch'".
Torch installation I think was successful because I entered python interpreter mode and import torch
was successful.
Maybe silly question, but:
Have you tried from launching x64 native developer command prompt, and then invoking your python virtual environment or equivalent that you use for nerfstudio?
To install tiny-cuda-nn, I have generally used what is in the nerfstudio readme...
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
@machenmusik ,
Yes.
when I install tiny-cuda-nn, I used both
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
and
git clone https://github.com/NVlabs/tiny-cuda-nn.git
cd tiny-cuda-nn
git submodule update --init --recursive
python -m pip install ./bindings/torch
The former command would fail because fmt and cutlass folders were empty https://github.com/NVlabs/tiny-cuda-nn/issues/208 , and I also failed... The latter command can install fmt and cutlass components, but I failed......
I also tried these two installation procedure with x64 native developer command prompt, but I failed.........
The failures are displayed with two kinds of error messages. one is https://github.com/nerfstudio-project/nerfstudio/issues/2060#issuecomment-1782595635 and the other is below
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [32 lines of output]
C:\Users\UserName\tiny-cuda-nn\bindings\torch\setup.py:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import parse_version
Building PyTorch extension for tiny-cuda-nn version 1.7
Obtained compute capability 86 from PyTorch
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Detected CUDA version 11.8
Targeting C++ standard 17
running bdist_wheel
D:\anaconda3\envs\nerfstudio_py310\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-310
creating build\lib.win-amd64-cpython-310\tinycudann
copying tinycudann\modules.py -> build\lib.win-amd64-cpython-310\tinycudann
copying tinycudann\__init__.py -> build\lib.win-amd64-cpython-310\tinycudann
running egg_info
creating tinycudann.egg-info
writing tinycudann.egg-info\PKG-INFO
writing dependency_links to tinycudann.egg-info\dependency_links.txt
writing top-level names to tinycudann.egg-info\top_level.txt
writing manifest file 'tinycudann.egg-info\SOURCES.txt'
reading manifest file 'tinycudann.egg-info\SOURCES.txt'
writing manifest file 'tinycudann.egg-info\SOURCES.txt'
copying tinycudann\bindings.cpp -> build\lib.win-amd64-cpython-310\tinycudann
running build_ext
error: [WinError 2] The system cannot find the file specified.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tinycudann
Running setup.py clean for tinycudann
Failed to build tinycudann
ERROR: Could not build wheels for tinycudann, which is required to install pyproject.toml-based projects
Im so dipressed...
I would be appreciated if you teach me your installation procedure and your environment. How did you install tiny-cuda-nn, and use volinga?
I followed the instructions as listed in README. Haven't had to reinstall tiny-cuda-nn in a while though. I am still using Python 3.9.x rather than 3.10.x, with miniconda.
I tried with miniconda3 virtual-env of Python 3.9.18. However, the installation of tiny-cuda-nn failed due to the same error above... Could you teach me the version or commit SHA-1 of tiny-cuda-nn?
Thank you!! I re-installed OS and other software, then I succeeded to install tiny-cuda-nn!! Successful conversion from ckpt file to nvol file!!
Maybe, software or library dependencies made things complex.
Describe the bug I have just updated NeRFStudio to the latest commit, and since then it seems that the ckpt (trained using ns-train volinga...) do not work with the Volinga convertor. After a successful upload I get the error code 607 and message "Wrong ckpt format"
To Reproduce Steps to reproduce the behavior:
ns-train volinga...
Additional context Even if there is a thread on Volinga's Discord server about this, I thought it could be useful to open an issue here too. It seems that the ckpt structure has changed.