Open luocha0107 opened 1 month ago
Hi, can you please try running:
nerfbaselines shell --method wild-gaussians
And running:
nerfbaselines train --method wild-gaussians --data datasets/0729_powertower_radial/
Also posting output of pip list
run from both your outer environment and from nerfbaselines shell ...
?
ok,I will try later and post module list. Thank for reply.
---- Replied Message ---- | From | Jonáš @.> | | Date | 10/25/2024 22:39 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [jkulhanek/wild-gaussians] train error (Issue #30) |
Hi, can you please try running:
nerfbaselines shell --method wild-gaussians
And running:
nerfbaselines train --method wild-gaussians --data datasets/0729_powertower_radial/
Also posting output of pip list run from both your outer environment and from nerfbaselines shell ...?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
when i running
nerfbaselines shell --method wild-gaussians
(flow_map) root@autodl-container-f34d45a126-5b9fd9ad:~/data_user/ysl/wild-gaussians# nerfbaselines shell --method wild-gaussians
info: Using method: wild-gaussians, backend: python
Traceback (most recent call last):
File "/root/miniconda3/envs/flow_map/bin/nerfbaselines", line 8, in
then running
nerfbaselines train --method wild-gaussians --data datasets/0729_powertower_radial/
(flow_map) root@autodl-container-f34d45a126-5b9fd9ad:~/data_user/ysl/wild-gaussians# nerfbaselines train --method wild-gaussians --data datasets/0729_powertower_radial/
info: Using method: wild-gaussians, backend: python
info: Loading train dataset
info: Detecting dataset format from path: /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial
info: Colmap dataloader is using LLFF split with 207 training and 30 test images
info: Loaded unknown dataset from path /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial using loader colmap
info: Loading images from /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial/images
loading images: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 207/207 [00:42<00:00, 4.92it/s]
info: Loading eval dataset
info: Detecting dataset format from path: /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial
info: Colmap dataloader is using LLFF split with 207 training and 30 test images
info: Loaded unknown dataset from path /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial using loader colmap
info: Loading images from /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial/images
loading images: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:05<00:00, 5.16it/s]
warning: Dataset ID not specified, dataset-specific config overrides may not be applied
info: Active presets:
info: Using config overrides: {}
info: Loading config file default.yml
info: using MLP layer as FFN
Generating skybox: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 207/207 [00:00<00:00, 1357.67it/s]
info: Adding skybox with 49818 points
Number of points at initialisation : 108904
info: Output directory: /root/data_user/ysl/wild-gaussians
info: Initialized loggers: tensorboard
training: 0%| | 0/70000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/miniconda3/envs/flow_map/bin/nerfbaselines", line 8, in ^~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 37.66 GiB (GPU 0; 23.68 GiB total capacity; 2.39 GiB already allocated; 20.82 GiB free; 2.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
pip list as follow: (flow_map) root@autodl-container-f34d45a126-5b9fd9ad:~/data_user/ysl/wild-gaussians# pip list Package Version Editable project location
absl-py 2.1.0 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aiosignal 1.3.1 antlr4-python3-runtime 4.9.3 anyio 4.6.2.post1 asttokens 2.4.1 attrs 24.2.0 beartype 0.18.5 beautifulsoup4 4.12.3 black 24.8.0 Brotli 1.0.9 certifi 2024.8.30 chardet 5.2.0 charset-normalizer 2.1.1 click 8.1.7 colorlog 6.8.2 contourpy 1.3.0 cycler 0.12.1 dacite 1.8.1 decorator 5.1.1 diff_gaussian_rasterization 0.0.0 /root/data_user/ysl/wild-gaussians/submodules/diff-gaussian-rasterization docker-pycreds 0.4.0 docstring_parser 0.16 einops 0.8.0 embreex 2.17.7.post5 executing 2.1.0 filelock 3.13.1 flow_vis_torch 0.1 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.9.0 gdown 5.2.0 gitdb 4.0.11 GitPython 3.1.43 gmpy2 2.1.2 grpcio 1.67.0 h11 0.14.0 httpcore 1.0.6 httpx 0.27.2 huggingface-hub 0.25.0 hydra-core 1.3.2 idna 3.10 imageio 2.36.0 ipython 8.28.0 jaxtyping 0.2.34 jedi 0.19.1 Jinja2 3.1.4 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 kiwisolver 1.4.7 lazy_loader 0.4 lightning 2.4.0 lightning-utilities 0.11.7 lxml 5.3.0 manifold3d 2.5.1 mapbox_earcut 1.0.2 Markdown 3.7 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.9.0 matplotlib-inline 0.1.7 mdurl 0.1.2 mediapy 1.2.2 mkl_fft 1.3.10 mkl_random 1.2.7 mkl-service 2.4.0 mpmath 1.3.0 msgpack 1.1.0 multidict 6.1.0 mypy-extensions 1.0.0 nerfbaselines 1.2.5 networkx 3.2.1 nodeenv 1.9.1 numpy 1.26.3 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.68 nvidia-nvtx-cu12 12.1.105 omegaconf 2.3.0 opencv-python 4.10.0.84 packaging 24.1 parso 0.8.4 pathspec 0.12.1 pexpect 4.9.0 pillow 10.4.0 pip 24.2 platformdirs 4.3.6 plyfile 1.0.3 prompt_toolkit 3.0.48 protobuf 4.25.5 psutil 6.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 pycollada 0.8 Pygments 2.18.0 pyliblzfse 0.4.1 pyparsing 3.1.4 PySocks 1.7.1 python-dateutil 2.9.0.post0 pytorch-lightning 2.4.0 PyYAML 6.0.2 referencing 0.35.1 requests 2.28.1 rich 13.9.3 rpds-py 0.20.0 Rtree 1.3.0 ruff 0.6.7 safetensors 0.4.5 scikit-image 0.24.0 scipy 1.14.1 sentry-sdk 2.14.0 setproctitle 1.3.3 setuptools 69.5.1 shapely 2.0.6 shtab 1.7.1 simple_knn 0.0.0 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 soupsieve 2.6 splines 0.3.2 stack-data 0.6.3 svg.path 6.3 sympy 1.13.3 tensorboard 2.17.0 tensorboard-data-server 0.7.2 tifffile 2024.9.20 timm 1.0.9 torch 2.0.1 torchaudio 2.0.2 torchmetrics 1.4.2 torchvision 0.15.2 tqdm 4.66.4 traitlets 5.14.3 trimesh 4.5.1 triton 3.0.0 typeguard 2.13.3 typing_extensions 4.11.0 tyro 0.8.14 urllib3 1.26.20 vhacdx 0.0.8.post1 viser 0.1.34 wandb 0.18.1 wcwidth 0.2.13 websockets 13.1 Werkzeug 3.0.4 wheel 0.44.0 wildgaussians 0.3.0 /root/data_user/ysl/wild-gaussians xatlas 0.0.9 xxhash 3.5.0 yarl 1.11.1 yourdfpy 0.0.56
my gpu is RTX3090(24G),from the terminal display it appears to be out of memory?
Ok, I guess you did local install? now the training works. I don’t know what was the issue before. It seamed like you have installed numpy 2.0 which isn’t compatible with pytorch, but it isnt the case. The issue with oom is perhaps caused by images being too large. Try either disabling the uncertainty loss or downscaling your images.
Ok, I guess you did local install? now the training works. I don’t know what was the issue before. It seamed like you have installed numpy 2.0 which isn’t compatible with pytorch, but it isnt the case. The issue with oom is perhaps caused by images being too large. Try either disabling the uncertainty loss or downscaling your images.
ok, thank you.
Is the issue resolved?
Is the issue resolved?
no,I tried to cut the number of images in half, but it was still oom. Next I will try to reduce the image resolution.
Is the issue resolved?
I have solved the problem with downscaled image size. It's running now. But it show that will take 70000 rounds and a dozen hours,will it stop automatically if the effect becomes good during training?
Hi, yes, I meant downscaling the images. The time changes during training, but it sounds like a lot. What GPU are you using?
Hi, yes, I meant downscaling the images. The time changes during training, but it sounds like a lot. What GPU are you using?
My gpu is NVIDIA GeForce RTX 3090. I account another problem now when i had trained 2000 rounds and downloaded the Alex-Net model weight,error as follow.
it seems to be the network connection,and I tried it a few times.
Hi, can you please verify that you have internet access on the compute node from which you run the command?
Hi, can you please verify that you have internet access on the compute node from which you run the command?
ok,thank you. I know why.
Hi, can you please verify that you have internet access on the compute node from which you run the command?
ok,thank you. I know why. How did you solve this problem, or what folder should you place the model manually downloaded?
Hi, can you please verify that you have internet access on the compute node from which you run the command?
ok,thank you. I know why. How did you solve this problem, or what folder should you place the model manually downloaded?
I remember the error came after downloading the model. The vpn on my server is not working. I'm not trying anymore.
Hi, sorry for the late reply. I'm a bit busy, but will look at it next week. Currently you need the internet access for the evaluation as the model is not loaded from cache, but streamed directly. Next week I will add the code which will allow the model to be loaded from cache so you could manually download it there. In the mean-time, you can disable the evaluation during training (and evaluate it locally after the training is finished). Would this resolve your issue? In that case, just set --eval-all-steps
and --eval-few-steps
to something like 1000000
so that it never runs the evaluation.
I used colmap dataset. (flow_map) root@autodl-container-f34d45a126-5b9fd9ad:~/data_user/ysl/wild-gaussians# nerfbaselines train --method wild-gaussians --data datasets/0729_powertower_radial/ info: Using method: wild-gaussians, backend: python info: Loading train dataset info: Detecting dataset format from path: /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial info: Colmap dataloader is using LLFF split with 207 training and 30 test images info: Loaded unknown dataset from path /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial using loader colmap info: Loading images from /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial/images loading images: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 207/207 [00:40<00:00, 5.14it/s] info: Loading eval dataset info: Detecting dataset format from path: /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial info: Colmap dataloader is using LLFF split with 207 training and 30 test images info: Loaded unknown dataset from path /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial using loader colmap info: Loading images from /root/data_user/ysl/wild-gaussians/datasets/0729_powertower_radial/images loading images: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:05<00:00, 5.17it/s] warning: Dataset ID not specified, dataset-specific config overrides may not be applied info: Active presets: info: Using config overrides: {} info: Loading config file default.yml info: using MLP layer as FFN Traceback (most recent call last): File "/root/miniconda3/envs/flow_map/bin/nerfbaselines", line 8, in
sys.exit(main())
^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/nerfbaselines/cli/_common.py", line 499, in invoke
return super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/nerfbaselines/cli/_common.py", line 440, in wrapped
raise e
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/nerfbaselines/cli/_common.py", line 433, in wrapped
return fn(args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/nerfbaselines/cli/_train.py", line 128, in train_command
method = method_cls(
^^^^^^^^^^^
File "/root/data_user/ysl/wild-gaussians/wildgaussians/method.py", line 1689, in init
self._setup_train(train_dataset, load_state_dict)
File "/root/data_user/ysl/wild-gaussians/wildgaussians/method.py", line 1700, in _setup_train
th_cameras = traindataset["cameras"].apply(lambda x, : torch.from_numpy(x).cuda())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/flow_map/lib/python3.11/site-packages/nerfbaselines/_types.py", line 247, in apply
poses=fn(self.poses, "poses"),
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/data_user/ysl/wild-gaussians/wildgaussians/method.py", line 1700, in
th_cameras = traindataset["cameras"].apply(lambda x, : torch.from_numpy(x).cuda())
^^^^^^^^^^^^^^^^^^^
TypeError: expected np.ndarray (got numpy.ndarray)