nv-tlabs / NKSR

[CVPR 2023 Highlight] Neural Kernel Surface Reconstruction
https://research.nvidia.com/labs/toronto-ai/NKSR
Other
747 stars 43 forks source link

About training Points2Surf dataset #23

Closed moyu026 closed 1 year ago

moyu026 commented 1 year ago

I want to use 'python train.py configs/points2surf/train.yaml' to train Points2Surf dataset, It just prints some info but doesn't start training. 07-06 09:33:42 (train.py:67) [INFO] Intelligent GPU selection: 0 Tensorboard logger, version number = 1 Global seed set to 0 /mnt/disk2/.conda/liudengjin/envs/NKSR/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: Setting Trainer(gpus=1) is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=1) instead. rank_zero_deprecation( Auto select gpus: [0] GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs

======= MODEL HYPER-PARAMETERS ======= <<<< exec: null include: null visualize: false test_set_shuffle: false no_mesh_vis: false solver_verbose: false runtime_density: false runtime_visualize: false test_print_metrics: false test_n_upsample: 2 test_use_gt_structure: false test_transform: null url: '' name: abc/full model: nksr_net feature: normal geometry: kernel voxel_size: 0.015 kernel_dim: 4 tree_depth: 4 adaptive_depth: 2 unet: f_maps: 32 udf: enabled: true interpolator: n_hidden: 2 hidden_dim: 16 solver: pos_weight: 10000.0 normal_weight: 10000.0 batch_size: 1 accumulate_grad_batches: 4 optimizer: Adam learning_rate: init: 0.0001 decay_mult: 0.7 decay_step: 50000 clip: 1.0e-06 weight_decay: 0.0 grad_clip: 0.5 adaptive_policy: method: normal tau: 0.1 supervision: structure_weight: 20.0 gt_type: PointTSDFVolume gt_surface: value: 200.0 normal: 100.0 subsample: 50000 spatial: weight: 300.0 reg_sdf_weight: 0.0 samplers:

  • type: uniform n_samples: 50000 expand: 1 expand_top: 3
  • type: band n_samples: 50000 eps: 0.5 gt_type: l1 gt_soft: true gt_band: 1.0 pd_transform: true vol_sup: true udf: weight: 150.0 samplers:
  • type: uniform n_samples: 80000 expand: 1 expand_top: 5
  • type: band n_samples: 20000 eps: 0.5 structure_schedule: start_step: 2500 end_step: 10000 _abc_transforms: [] _abc_test_type: var-n test_dataset: Points2SurfDataset test_num_workers: 4 test_kwargs: base_path: data/points2surf dataset_name: abc type_name: var-n transforms: [] split: test random_seed: fixed train_dataset: Points2SurfDataset train_val_num_workers: 4 train_kwargs: base_path: data/points2surf dataset_name: train type_name: var-n transforms: [] split: train random_seed: 0 val_dataset: Points2SurfDataset val_kwargs: base_path: data/points2surf dataset_name: train type_name: var-n transforms: [] split: val random_seed: fixed

====================================== <<<< LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

heiwang1997 commented 1 year ago

Does it start now? It will take some time to compile the torch extensions

moyu026 commented 1 year ago

No, it just prints some info and stops running

heiwang1997 commented 1 year ago

When you hit Ctrl+C, what does it print out?

moyu026 commented 1 year ago

when I hit Ctrl+C, it shows me that open3d is not installed, but my pachages have open3d

(NKSR) l@node1:/mnt/disk2/workspace/l/NKSR-public$ python train.py configs/points2surf/train.yaml 07-06 15:25:29 (train.py:67) [INFO] Intelligent GPU selection: 0 Tensorboard logger, version number = 1 Global seed set to 0 /mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: Setting Trainer(gpus=1) is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=1) instead. rank_zero_deprecation( Auto select gpus: [0] GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs ^C07-06 15:25:30 (o3d.py:10) [ERROR] Open3D not installed! You can try either the following 2 options:

  1. (recommended, using customized Open3D that enables view sync, animation, ...)

pip install python-pycg[full] -f https://pycg.s3.ap-northeast-1.amazonaws.com/packages/index.html

  1. (using official Open3D)

pip install python-pycg[all]

Traceback (most recent call last): File "/mnt/disk2/workspace/l/NKSR-public/train.py", line 258, in net_module = importlib.import_module("models." + model_args.model).Model File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/mnt/disk2/workspace/l/NKSR-public/models/nksr_net.py", line 15, in from nksr import NKSRNetwork, SparseFeatureHierarchy File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/init.py", line 18, in from nksr.nn.unet import SparseStructureNet File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/nn/init.py", line 10, in from .modules import Conv3d, GroupNorm, Activation, GroupNorm, MaxPooling, Upsampling, SparseZeroPadding File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/nn/modules.py", line 14, in from nksr.svh import SparseFeatureHierarchy, KernelMap, VoxelStatus File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/svh.py", line 17, in from pycg import vis File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/pycg/vis.py", line 9, in from pycg import o3d File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/pycg/o3d.py", line 8, in from open3d import * File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/open3d/init.py", line 76, in from open3d.cuda.pybind import (camera, data, geometry, io, ImportError: KeyboardInterrupt: [Open3D INFO] Memory Statistics: (Device) (#Malloc) (#Free) [Open3D INFO] --------------------------------------------- [Open3D WARNING] CPU:0: 15 12 --> 3 with 384 total bytes [Open3D WARNING] 0x564f75dd17d0 @ 128 bytes [Open3D WARNING] 0x564f75dd06b0 @ 128 bytes [Open3D WARNING] 0x564f75dc13a0 @ 128 bytes [Open3D INFO] ---------------------------------------------

Package Version


absl-py 1.4.0 addict 2.4.0 aiohttp 3.8.4 aiosignal 1.3.1 ansi2html 1.8.0 antlr4-python3-runtime 4.9.3 appdirs 1.4.4 asttokens 2.2.1 async-timeout 4.0.2 attrs 23.1.0 backcall 0.2.0 ca-certificates 2021.4.13 cachetools 5.3.1 calmsize 0.1.3 certifi 2023.5.7 charset-normalizer 3.1.0 click 8.1.3 cmake 3.26.3 comm 0.1.3 ConfigArgParse 1.5.3 contourpy 1.0.7 cycler 0.11.0 dash 2.11.1 dash-core-components 2.0.0 dash-html-components 2.0.0 dash-table 5.0.0 debugpy 1.6.7 decorator 5.1.1 docker-pycreds 0.4.0 executing 1.2.0 fastjsonschema 2.17.1 filelock 3.12.0 fire 0.5.0 Flask 2.2.5 flatten-dict 0.4.2 fonttools 4.40.0 frozenlist 1.3.3 fsspec 2023.5.0 gitdb 4.0.10 GitPython 3.1.31 google-auth 2.21.0 google-auth-oauthlib 1.0.0 grpcio 1.56.0 idna 3.4 ipykernel 6.23.2 ipython 8.13.2 ipywidgets 8.0.6 itsdangerous 2.1.2 jedi 0.18.2 Jinja2 3.1.2 joblib 1.2.0 jsonschema 4.17.3 jupyter_client 8.2.0 jupyter_core 5.3.0 jupyterlab-widgets 3.0.7 kiwisolver 1.4.4 lightning-lite 1.8.0 lightning-utilities 0.3.0 lit 16.0.6 Markdown 3.4.3 MarkupSafe 2.1.2 matplotlib 3.7.1 matplotlib-inline 0.1.6 mpmath 1.3.0 multidict 6.0.4 nbformat 5.5.0 nest-asyncio 1.5.6 networkx 3.1 ninja 1.11.1 nksr 1.0.3+pt20cu117 numpy 1.25.0 oauthlib 3.2.2 omegaconf 2.3.0 open3d 0.16.1+c65c7ef packaging 23.1 pandas 2.0.3 parso 0.8.3 pathtools 0.1.2 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.5.0 pip 23.1.2 platformdirs 3.5.3 plotly 5.14.1 plyfile 0.9 prompt-toolkit 3.0.38 protobuf 4.23.3 psutil 5.9.5 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.5.0 pyasn1-modules 0.3.0 pybind11 2.10.4 Pygments 2.15.1 pykdtree 1.3.7.post0 pyntcloud 0.3.1 pynvml 11.5.0 pyparsing 3.1.0 pyquaternion 0.9.9 pyrsistent 0.19.3 python-dateutil 2.8.2 python-pycg 0.5.2 pytorch-lightning 1.8.0 pytz 2023.3 PyYAML 6.0 pyzmq 25.1.0 randomname 0.2.1 requests 2.31.0 requests-oauthlib 1.3.1 retrying 1.3.4 rsa 4.9 scikit-learn 1.2.2 scipy 1.11.1 sentry-sdk 1.26.0 setproctitle 1.3.2 setuptools 68.0.0 six 1.16.0 smmap 5.0.0 stack-data 0.6.2 sympy 1.12 tenacity 8.2.2 tensorboard 2.13.0 tensorboard-data-server 0.7.0 termcolor 2.3.0 threadpoolctl 3.1.0 torch 2.0.0+cu117 torch-scatter 2.1.1 torchmetrics 0.11.4 torchvision 0.15.0+cu117 tornado 6.3.2 tqdm 4.65.0 traitlets 5.9.0 triton 2.0.0 typing_extensions 4.7.0 tzdata 2023.3 urllib3 1.26.16 usd-core 23.5 wandb 0.15.4 wcwidth 0.2.6 Werkzeug 2.2.3 wheel 0.40.0 widgetsnbextension 4.0.7 yarl 1.9.2

moyu026 commented 1 year ago

I change open3d version and this problem is solved,but it still doesn't start training

(NKSR) l@node1:/mnt/disk2/workspace/l/NKSR-public$ python train.py configs/points2surf/train.yaml 07-07 15:12:21 (train.py:67) [INFO] Intelligent GPU selection: 0 Tensorboard logger, version number = 1 Global seed set to 0 /mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: Setting Trainer(gpus=1) is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=1) instead. rank_zero_deprecation( Auto select gpus: [0] GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs ^CTraceback (most recent call last): File "/mnt/disk2/workspace/l/NKSR-public/train.py", line 259, in net_model = net_module(model_args) File "/mnt/disk2/workspace/l/NKSR-public/models/nksr_net.py", line 35, in init self.network = NKSRNetwork(self.hparams) File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/init.py", line 65, in init self.unet = SparseStructureNet( File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/nn/unet.py", line 162, in init self.encoders.add_module(f'Enc{layer_idx}', SparseDoubleConv( File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/nn/unet.py", line 82, in init SparseConvBlock(conv2_in_channels, conv2_out_channels, order, num_groups)) File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/nn/unet.py", line 32, in init 'Conv', Conv3d(in_channels, out_channels, kernel_size, 1, bias='g' not in order, transposed=False)) File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/nn/modules.py", line 46, in init self.reset_parameters() File "/mnt/disk2/.conda/l/envs/NKSR/lib/python3.10/site-packages/nksr/nn/modules.py", line 62, in resetparameters self.kernel.data.uniform(-std, std) KeyboardInterrupt