Closed NoOneUST closed 3 years ago
.sens
file.Thank you! I have another two questions:
python main.py --cfg ./config/test.yaml
, I met the following problem. The code gets stuck after reaching the last epoch, and there is no file generated in results/
.2021-07-07 19:00:33.366 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:33.367 | INFO | __main__:test:258 - Epoch 47, Iter 1077/1086, test loss = 10.393, time = 0.140791
2021-07-07 19:00:34.650 | INFO | __main__:test:249 - scene0806_00_4
2021-07-07 19:00:34.651 | INFO | __main__:test:249 - scene0806_00_5
2021-07-07 19:00:34.826 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:34.826 | INFO | __main__:test:258 - Epoch 47, Iter 1078/1086, test loss = 11.334, time = 0.175247
2021-07-07 19:00:34.962 | INFO | __main__:test:249 - scene0806_00_6
2021-07-07 19:00:34.963 | INFO | __main__:test:249 - scene0806_00_7
2021-07-07 19:00:35.118 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:35.118 | INFO | __main__:test:258 - Epoch 47, Iter 1079/1086, test loss = 11.329, time = 0.155381
2021-07-07 19:00:35.133 | INFO | __main__:test:249 - scene0806_00_8
2021-07-07 19:00:35.133 | INFO | __main__:test:249 - scene0806_00_9
2021-07-07 19:00:35.274 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:35.275 | INFO | __main__:test:258 - Epoch 47, Iter 1080/1086, test loss = 11.333, time = 0.141273
2021-07-07 19:00:35.293 | INFO | __main__:test:249 - scene0806_00_10
2021-07-07 19:00:35.294 | INFO | __main__:test:249 - scene0806_00_11
2021-07-07 19:00:35.431 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:35.432 | INFO | __main__:test:258 - Epoch 47, Iter 1081/1086, test loss = 11.333, time = 0.137826
2021-07-07 19:00:36.838 | INFO | __main__:test:249 - scene0806_00_12
2021-07-07 19:00:36.838 | INFO | __main__:test:249 - scene0806_00_13
2021-07-07 19:00:36.998 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:36.998 | INFO | __main__:test:258 - Epoch 47, Iter 1082/1086, test loss = 10.918, time = 0.159419
2021-07-07 19:00:37.016 | INFO | __main__:test:249 - scene0806_00_14
2021-07-07 19:00:37.017 | INFO | __main__:test:249 - scene0806_00_15
2021-07-07 19:00:37.199 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:37.200 | INFO | __main__:test:258 - Epoch 47, Iter 1083/1086, test loss = 11.236, time = 0.182459
2021-07-07 19:00:37.215 | INFO | __main__:test:249 - scene0806_00_16
2021-07-07 19:00:37.216 | INFO | __main__:test:249 - scene0806_00_17
2021-07-07 19:00:37.363 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:37.363 | INFO | __main__:test:258 - Epoch 47, Iter 1084/1086, test loss = 10.924, time = 0.147570
2021-07-07 19:00:37.383 | INFO | __main__:test:249 - scene0806_00_18
2021-07-07 19:00:37.491 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0
2021-07-07 19:00:37.491 | INFO | __main__:test:258 - Epoch 47, Iter 1085/1086, test loss = 10.809, time = 0.108360
2021-07-07 19:00:37.981 | INFO | __main__:test:270 - epoch 47 avg_test_scalars:
python -m torch.distributed.launch --nproc_per_node=2 main.py --cfg ./config/train.yaml
, I met the following problem. I find that in config/train.yaml
, the default BATCH_SIZE: 1
. Is this directly related to the problem? Can I directly follow the README
to reimplement your result?Traceback (most recent call last):
File "main.py", line 301, in <module>
train()
File "main.py", line 205, in train
loss, scalar_outputs = train_sample(sample)
File "main.py", line 281, in train_sample
outputs, loss_dict = model(sample)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/export/data/lwangcg/NeuralRecon/models/neuralrecon.py", line 82, in forward
outputs, loss_dict = self.neucon_net(features, inputs, outputs)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/export/data/lwangcg/NeuralRecon/models/neucon_network.py", line 157, in forward
feat = self.sp_convs[i](point_feat)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/export/data/lwangcg/NeuralRecon/models/modules.py", line 150, in forward
x0 = self.stem(x0)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torchsparse/nn/modules/norm.py", line 13, in forward
return fapply(input, super().forward)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torchsparse/nn/utils/apply.py", line 12, in fapply
feats = fn(input.feats, *args, **kwargs)
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 178, in forward
self.eps,
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2279, in batch_norm
_verify_batch_size(input.size())
File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2247, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 32])
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 17327) of binary: /export/data/lwangcg/anaconda3/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
Q1: Hello, what is the specific structure of data?Do you only need the color depth intrinsic pose parsed from the. Sens file? Q2: Can I just put one scene under the scans and scans_test file directory? Q3: Why do I run Python main. py -- cfg. / config / test. yaml command and only get the following result: number of gpus: 1 creating log file ./checkpoints/20210714_165708_test.log Then the program gets stuck and no longer runs down.
Now I find that when MODEL.FUSION.FUSION_ON=False, MODEL.FUSION.FULL=False, the model cannot work; while when MODEL.FUSION.FUSION_ON=True, MODEL.FUSION.FULL=True, the model works.
2021-07-18 14:59:59.567 | INFO | main:train:221 - Epoch 14/50, Iter 2101/2487, train loss = 2.133, time = 0.593
2021-07-18 15:00:00.104 | INFO | main:train:221 - Epoch 14/50, Iter 2102/2487, train loss = 1.915, time = 0.536
2021-07-18 15:00:00.669 | INFO | main:train:221 - Epoch 14/50, Iter 2103/2487, train loss = 1.594, time = 0.561
2021-07-18 15:00:01.218 | INFO | main:train:221 - Epoch 14/50, Iter 2104/2487, train loss = 2.207, time = 0.548
2021-07-18 15:00:01.787 | INFO | main:train:221 - Epoch 14/50, Iter 2105/2487, train loss = 1.536, time = 0.568
2021-07-18 15:00:02.359 | INFO | main:train:221 - Epoch 14/50, Iter 2106/2487, train loss = 1.890, time = 0.571
2021-07-18 15:00:02.914 | INFO | main:train:221 - Epoch 14/50, Iter 2107/2487, train loss = 1.562, time = 0.553
2021-07-18 15:00:03.468 | INFO | main:train:221 - Epoch 14/50, Iter 2108/2487, train loss = 1.731, time = 0.552
2021-07-18 15:00:04.101 | INFO | main:train:221 - Epoch 14/50, Iter 2109/2487, train loss = 2.514, time = 0.632
2021-07-18 15:00:04.689 | INFO | main:train:221 - Epoch 14/50, Iter 2110/2487, train loss = 1.482, time = 0.587
2021-07-18 15:00:05.277 | INFO | main:train:221 - Epoch 14/50, Iter 2111/2487, train loss = 1.176, time = 0.587
2021-07-18 15:00:05.463 | WARNING | models.neucon_network:compute_loss:242 - target: no valid voxel when computing loss
Traceback (most recent call last):
File "main.py", line 311, in
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_0y2r3t4n/none_5rdjr6_9/attempt_1/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_0y2r3t4n/none_5rdjr6_9/attempt_1/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_0y2r3t4n/none_5rdjr6_9/attempt_1/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_0y2r3t4n/none_5rdjr6_9/attempt_1/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_0y2r3t4n/none_5rdjr6_9/attempt_1/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_0y2r3t4n/none_5rdjr6_9/attempt_1/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_0y2r3t4n/none_5rdjr6_9/attempt_1/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_0y2r3t4n/none_5rdjr6_9/attempt_1/7/error.json
@NoOneUST Hello, did you convert the format of the original ScanNet data into the data structure which are author mentioned?
@NoOneUST Hello, did you convert the format of the original ScanNet data into the data structure which are author mentioned?
Yes. I write a python program to run the C++ convertor provided by the author one by one.
>>>ls
all_tsdf_9 batch_scannet.py download-scannet.py ScanNet scannetv2-labels.combined.tsv scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt scans scans_test tasks
>>>ls scans
scene0000_00 scene0051_01 scene0100_00 ...
>>>ls scans_test
scene0707_00 scene0715_00 scene0723_00 ...
>>>ls all_tsdf_9
fragments_test.pkl fragments_train.pkl fragments_val.pkl splits scene0000_00 scene0000_01 ...
>>>ls all_tsdf_9/splits
scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt
convertor:
import os
import argparse
from multiprocessing import Pool
import multiprocessing
def get_opts():
parser = argparse.ArgumentParser()
parser.add_argument('--id', type=int, default=0)
return parser.parse_args()
def run(pair):
dir, path = pair
command = 'python scannet/ScanNet/SensReader/python/reader.py --filename ' + str(path) + '/' + dir + '/' + str(dir) + '.sens --output_path ' + str(
path) + '/' + str(dir) + ' --export_depth_images --export_color_images --export_poses --export_intrinsics'
# command = 'scannet/ScanNet/SensReader/c++/sens ' + str(path) + '/' + dir + '/' + str(dir) + '.sens ' + str(path) + '/' + str(dir) + '/output'
print(command)
result = os.system(command)
return result
hparams = get_opts()
paths = ['scannet/scans', 'scannet/scans_test']
num_select = 32
for path in paths:
dirs = list(os.listdir(path))
index = 0
while index < len(dirs):
if int(index / num_select) % 12 != hparams.id:
index += num_select
continue
else:
p = Pool(processes=num_select)
results = p.map(run, zip(dirs[index: index + num_select], [path] * num_select))
for result in results:
if result != 0:
print('Error in [' + str(index) + ' : ' + str(index + num_select) + ']')
raise ValueError
index += num_select
@NoOneUST Thank you for your reply! I have another questions: I failed to install torchsprase all the time(when run conda env create -f environment.yaml).Do you also use these versions (in environment.yaml)?
install torchsprase error is:
`/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:262:1: note: declared here
DeprecatedTypeProperties & type() const {
^ ~~
/home/yjl/code/NeuralRecon/demo_video/torchsparse-master/torchsparse/backend/devoxelize/devoxelize_cuda.cu:70:97: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES_AND_HALF(
^
/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:46:1: note: declared here
inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties &t) {
^~~
/home/yjl/code/NeuralRecon/demo_video/torchsparse-master/torchsparse/backend/devoxelize/devoxelize_cuda.cu: In lambda function:
/home/yjl/code/NeuralRecon/demo_video/torchsparse-master/torchsparse/backend/devoxelize/devoxelize_cuda.cu:90:46: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES_AND_HALF(
^
/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:262:1: note: declared here
DeprecatedTypeProperties & type() const {
^ ~~
/home/yjl/code/NeuralRecon/demo_video/torchsparse-master/torchsparse/backend/devoxelize/devoxelize_cuda.cu:90:101: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated: passing at::DeprecatedTypeProperties to an AT_DISPATCH macro is deprecated, pass an at::ScalarType instead [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES_AND_HALF(
^
/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:46:1: note: declared here
inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties &t) {
^~~
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134: required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6693:95: required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’ without object
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1400, in _run_ninja_build
check=True)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "setup.py", line 40, in <module>
zip_safe=False,
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 580, in build_extensions
build_ext.build_extensions(self)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 423, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1140, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/home/jmserver03/.conda/envs/neucon/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1413, in _run_ninja_build
raise RuntimeError(message)
**RuntimeError: Error compiling objects for extension**`
@NoOneUST Hello, did you convert the format of the original ScanNet data into the data structure which are author mentioned?
Yes. I write a python program to run the C++ convertor provided by the author one by one.
>>>ls all_tsdf_9 batch_scannet.py download-scannet.py ScanNet scannetv2-labels.combined.tsv scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt scans scans_test tasks >>>ls scans scene0000_00 scene0051_01 scene0100_00 ... >>>ls scans_test scene0707_00 scene0715_00 scene0723_00 ... >>>ls all_tsdf_9 fragments_test.pkl fragments_train.pkl fragments_val.pkl splits scene0000_00 scene0000_01 ... >>>ls all_tsdf_9/splits scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt
convertor:
import os import argparse from multiprocessing import Pool import multiprocessing def get_opts(): parser = argparse.ArgumentParser() parser.add_argument('--id', type=int, default=0) return parser.parse_args() def run(pair): dir, path = pair command = 'python scannet/ScanNet/SensReader/python/reader.py --filename ' + str(path) + '/' + dir + '/' + str(dir) + '.sens --output_path ' + str( path) + '/' + str(dir) + ' --export_depth_images --export_color_images --export_poses --export_intrinsics' # command = 'scannet/ScanNet/SensReader/c++/sens ' + str(path) + '/' + dir + '/' + str(dir) + '.sens ' + str(path) + '/' + str(dir) + '/output' print(command) result = os.system(command) return result hparams = get_opts() paths = ['scannet/scans', 'scannet/scans_test'] num_select = 32 for path in paths: dirs = list(os.listdir(path)) index = 0 while index < len(dirs): if int(index / num_select) % 12 != hparams.id: index += num_select continue else: p = Pool(processes=num_select) results = p.map(run, zip(dirs[index: index + num_select], [path] * num_select)) for result in results: if result != 0: print('Error in [' + str(index) + ' : ' + str(index + num_select) + ']') raise ValueError index += num_select
Hello, How do I generate .pkl
file?
@NoOneUST Hello, did you convert the format of the original ScanNet data into the data structure which are author mentioned?
Yes. I write a python program to run the C++ convertor provided by the author one by one.
>>>ls all_tsdf_9 batch_scannet.py download-scannet.py ScanNet scannetv2-labels.combined.tsv scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt scans scans_test tasks >>>ls scans scene0000_00 scene0051_01 scene0100_00 ... >>>ls scans_test scene0707_00 scene0715_00 scene0723_00 ... >>>ls all_tsdf_9 fragments_test.pkl fragments_train.pkl fragments_val.pkl splits scene0000_00 scene0000_01 ... >>>ls all_tsdf_9/splits scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt
convertor:
import os import argparse from multiprocessing import Pool import multiprocessing def get_opts(): parser = argparse.ArgumentParser() parser.add_argument('--id', type=int, default=0) return parser.parse_args() def run(pair): dir, path = pair command = 'python scannet/ScanNet/SensReader/python/reader.py --filename ' + str(path) + '/' + dir + '/' + str(dir) + '.sens --output_path ' + str( path) + '/' + str(dir) + ' --export_depth_images --export_color_images --export_poses --export_intrinsics' # command = 'scannet/ScanNet/SensReader/c++/sens ' + str(path) + '/' + dir + '/' + str(dir) + '.sens ' + str(path) + '/' + str(dir) + '/output' print(command) result = os.system(command) return result hparams = get_opts() paths = ['scannet/scans', 'scannet/scans_test'] num_select = 32 for path in paths: dirs = list(os.listdir(path)) index = 0 while index < len(dirs): if int(index / num_select) % 12 != hparams.id: index += num_select continue else: p = Pool(processes=num_select) results = p.map(run, zip(dirs[index: index + num_select], [path] * num_select)) for result in results: if result != 0: print('Error in [' + str(index) + ' : ' + str(index + num_select) + ']') raise ValueError index += num_select
I see. I didn't read the README.md
carefully
@NoOneUST Hello, did you convert the format of the original ScanNet data into the data structure which are author mentioned?
Yes. I write a python program to run the C++ convertor provided by the author one by one.
>>>ls all_tsdf_9 batch_scannet.py download-scannet.py ScanNet scannetv2-labels.combined.tsv scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt scans scans_test tasks >>>ls scans scene0000_00 scene0051_01 scene0100_00 ... >>>ls scans_test scene0707_00 scene0715_00 scene0723_00 ... >>>ls all_tsdf_9 fragments_test.pkl fragments_train.pkl fragments_val.pkl splits scene0000_00 scene0000_01 ... >>>ls all_tsdf_9/splits scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt
convertor:
import os import argparse from multiprocessing import Pool import multiprocessing def get_opts(): parser = argparse.ArgumentParser() parser.add_argument('--id', type=int, default=0) return parser.parse_args() def run(pair): dir, path = pair command = 'python scannet/ScanNet/SensReader/python/reader.py --filename ' + str(path) + '/' + dir + '/' + str(dir) + '.sens --output_path ' + str( path) + '/' + str(dir) + ' --export_depth_images --export_color_images --export_poses --export_intrinsics' # command = 'scannet/ScanNet/SensReader/c++/sens ' + str(path) + '/' + dir + '/' + str(dir) + '.sens ' + str(path) + '/' + str(dir) + '/output' print(command) result = os.system(command) return result hparams = get_opts() paths = ['scannet/scans', 'scannet/scans_test'] num_select = 32 for path in paths: dirs = list(os.listdir(path)) index = 0 while index < len(dirs): if int(index / num_select) % 12 != hparams.id: index += num_select continue else: p = Pool(processes=num_select) results = p.map(run, zip(dirs[index: index + num_select], [path] * num_select)) for result in results: if result != 0: print('Error in [' + str(index) + ' : ' + str(index + num_select) + ']') raise ValueError index += num_select
Hi @NoOneUST, did you figure out how to correctly train in phase 1 with MODEL.FUSION.FUSION_ON=False, MODEL.FUSION.FULL=False
? I ran into the same error as you did:
| WARNING | models.neucon_network:compute_loss:242 - target: no valid voxel when computing loss
...
/python3.7/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 32])
@
@NoOneUST Hello, did you convert the format of the original ScanNet data into the data structure which are author mentioned?
Yes. I write a python program to run the C++ convertor provided by the author one by one.
>>>ls all_tsdf_9 batch_scannet.py download-scannet.py ScanNet scannetv2-labels.combined.tsv scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt scans scans_test tasks >>>ls scans scene0000_00 scene0051_01 scene0100_00 ... >>>ls scans_test scene0707_00 scene0715_00 scene0723_00 ... >>>ls all_tsdf_9 fragments_test.pkl fragments_train.pkl fragments_val.pkl splits scene0000_00 scene0000_01 ... >>>ls all_tsdf_9/splits scannetv2_test.txt scannetv2_train.txt scannetv2_val.txt
convertor:
import os import argparse from multiprocessing import Pool import multiprocessing def get_opts(): parser = argparse.ArgumentParser() parser.add_argument('--id', type=int, default=0) return parser.parse_args() def run(pair): dir, path = pair command = 'python scannet/ScanNet/SensReader/python/reader.py --filename ' + str(path) + '/' + dir + '/' + str(dir) + '.sens --output_path ' + str( path) + '/' + str(dir) + ' --export_depth_images --export_color_images --export_poses --export_intrinsics' # command = 'scannet/ScanNet/SensReader/c++/sens ' + str(path) + '/' + dir + '/' + str(dir) + '.sens ' + str(path) + '/' + str(dir) + '/output' print(command) result = os.system(command) return result hparams = get_opts() paths = ['scannet/scans', 'scannet/scans_test'] num_select = 32 for path in paths: dirs = list(os.listdir(path)) index = 0 while index < len(dirs): if int(index / num_select) % 12 != hparams.id: index += num_select continue else: p = Pool(processes=num_select) results = p.map(run, zip(dirs[index: index + num_select], [path] * num_select)) for result in results: if result != 0: print('Error in [' + str(index) + ' : ' + str(index + num_select) + ']') raise ValueError index += num_select
Hi @NoOneUST, did you figure out how to correctly train in phase 1 with
MODEL.FUSION.FUSION_ON=False, MODEL.FUSION.FULL=False
? I ran into the same error as you did:| WARNING | models.neucon_network:compute_loss:242 - target: no valid voxel when computing loss ... /python3.7/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size)) ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 32])
@HaFred, i met the same problem, do you solve it now?
After reviewing the traceback above, I found that just did it as what him proposed is ok: MODEL.FUSION.FUSION_ON=True, MODEL.FUSION.FULL=True
.
Thank you! I have another two questions:
- When I try to run the command
python main.py --cfg ./config/test.yaml
, I met the following problem. The code gets stuck after reaching the last epoch, and there is no file generated inresults/
.2021-07-07 19:00:33.366 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:33.367 | INFO | __main__:test:258 - Epoch 47, Iter 1077/1086, test loss = 10.393, time = 0.140791 2021-07-07 19:00:34.650 | INFO | __main__:test:249 - scene0806_00_4 2021-07-07 19:00:34.651 | INFO | __main__:test:249 - scene0806_00_5 2021-07-07 19:00:34.826 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:34.826 | INFO | __main__:test:258 - Epoch 47, Iter 1078/1086, test loss = 11.334, time = 0.175247 2021-07-07 19:00:34.962 | INFO | __main__:test:249 - scene0806_00_6 2021-07-07 19:00:34.963 | INFO | __main__:test:249 - scene0806_00_7 2021-07-07 19:00:35.118 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:35.118 | INFO | __main__:test:258 - Epoch 47, Iter 1079/1086, test loss = 11.329, time = 0.155381 2021-07-07 19:00:35.133 | INFO | __main__:test:249 - scene0806_00_8 2021-07-07 19:00:35.133 | INFO | __main__:test:249 - scene0806_00_9 2021-07-07 19:00:35.274 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:35.275 | INFO | __main__:test:258 - Epoch 47, Iter 1080/1086, test loss = 11.333, time = 0.141273 2021-07-07 19:00:35.293 | INFO | __main__:test:249 - scene0806_00_10 2021-07-07 19:00:35.294 | INFO | __main__:test:249 - scene0806_00_11 2021-07-07 19:00:35.431 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:35.432 | INFO | __main__:test:258 - Epoch 47, Iter 1081/1086, test loss = 11.333, time = 0.137826 2021-07-07 19:00:36.838 | INFO | __main__:test:249 - scene0806_00_12 2021-07-07 19:00:36.838 | INFO | __main__:test:249 - scene0806_00_13 2021-07-07 19:00:36.998 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:36.998 | INFO | __main__:test:258 - Epoch 47, Iter 1082/1086, test loss = 10.918, time = 0.159419 2021-07-07 19:00:37.016 | INFO | __main__:test:249 - scene0806_00_14 2021-07-07 19:00:37.017 | INFO | __main__:test:249 - scene0806_00_15 2021-07-07 19:00:37.199 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:37.200 | INFO | __main__:test:258 - Epoch 47, Iter 1083/1086, test loss = 11.236, time = 0.182459 2021-07-07 19:00:37.215 | INFO | __main__:test:249 - scene0806_00_16 2021-07-07 19:00:37.216 | INFO | __main__:test:249 - scene0806_00_17 2021-07-07 19:00:37.363 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:37.363 | INFO | __main__:test:258 - Epoch 47, Iter 1084/1086, test loss = 10.924, time = 0.147570 2021-07-07 19:00:37.383 | INFO | __main__:test:249 - scene0806_00_18 2021-07-07 19:00:37.491 | WARNING | models.neucon_network:forward:184 - no valid points: scale 0 2021-07-07 19:00:37.491 | INFO | __main__:test:258 - Epoch 47, Iter 1085/1086, test loss = 10.809, time = 0.108360 2021-07-07 19:00:37.981 | INFO | __main__:test:270 - epoch 47 avg_test_scalars:
- When I run the command
python -m torch.distributed.launch --nproc_per_node=2 main.py --cfg ./config/train.yaml
, I met the following problem. I find that inconfig/train.yaml
, the defaultBATCH_SIZE: 1
. Is this directly related to the problem? Can I directly follow theREADME
to reimplement your result?Traceback (most recent call last): File "main.py", line 301, in <module> train() File "main.py", line 205, in train loss, scalar_outputs = train_sample(sample) File "main.py", line 281, in train_sample outputs, loss_dict = model(sample) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 799, in forward output = self.module(*inputs[0], **kwargs[0]) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/export/data/lwangcg/NeuralRecon/models/neuralrecon.py", line 82, in forward outputs, loss_dict = self.neucon_net(features, inputs, outputs) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/export/data/lwangcg/NeuralRecon/models/neucon_network.py", line 157, in forward feat = self.sp_convs[i](point_feat) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/export/data/lwangcg/NeuralRecon/models/modules.py", line 150, in forward x0 = self.stem(x0) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torchsparse/nn/modules/norm.py", line 13, in forward return fapply(input, super().forward) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torchsparse/nn/utils/apply.py", line 12, in fapply feats = fn(input.feats, *args, **kwargs) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 178, in forward self.eps, File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2279, in batch_norm _verify_batch_size(input.size()) File "/export/data/lwangcg/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2247, in _verify_batch_size raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size)) ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 32]) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 17327) of binary: /export/data/lwangcg/anaconda3/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
@NoOneUST @JiamingSuen Hello, I also met problem 1. Could you tell me how to fix it? Many thanks!
- You can use the RGB-D extraction tool here to extract the
.sens
file.- NeuralRecon does not produce colored mesh. You may use a texture mapping tool like this to apply image texture to the reconstructed mesh. BTW, the shading of the attached screenshot is not correct. You can change the back-face mode in MeshLab to obtain a correct shading.
Hi @JiamingSuen Could you please elaborate on how to generate a colored mesh from NeuralRecon's output? Like what are the necessary steps. Thank you so much and look forward to hearing from you!
Maybe just like how you presented in your Project Page below, how did you apply the color on the reconstructed mesh?
- You can use the RGB-D extraction tool here to extract the
.sens
file.- NeuralRecon does not produce colored mesh. You may use a texture mapping tool like this to apply image texture to the reconstructed mesh. BTW, the shading of the attached screenshot is not correct. You can change the back-face mode in MeshLab to obtain a correct shading.
Hi @JiamingSuen Could you please elaborate on how to generate a colored mesh from NeuralRecon's output? Like what are the necessary steps. Thank you so much and look forward to hearing from you!
Maybe just like how you presented in your Project Page below, how did you apply the color on the reconstructed mesh?
One possible way of doing it (possibly the most accurate way) is to use the using nearest neighbors to transfer trimesh.mesh.visual.vertex_colors
of the scannet gt mesh into the one of the inferred mesh.
Q1: As described in ScanNet, the data structure of ScanNet is:
However, you mention that the data structure is:
So, how should I modify the original ScanNet data to satisfy the requirements?
Q2: Following the guidance, I reconstruct the mesh from the demo video. Why does the mesh have no color?