tianweiy / CenterPoint

MIT License
1.88k stars 455 forks source link

INSTALL ERROR #237

Closed TianhaoFu closed 2 years ago

TianhaoFu commented 2 years ago

Hi, Thanks for your code. When I was building environment following INSTALL.md[Cuda Extensions], I came across such error:

bash setup.sh 
running build_ext
building 'deform_conv_cuda' extension
Emitting ninja build file /root/centerpoint/det3d/ops/dcn/build/temp.linux-x86_64-3.6/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /root/centerpoint/det3d/ops/dcn/build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.6m -c -c /root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp -o /root/centerpoint/det3d/ops/dcn/build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=deform_conv_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /root/centerpoint/det3d/ops/dcn/build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o 
c++ -MMD -MF /root/centerpoint/det3d/ops/dcn/build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.6m -c -c /root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp -o /root/centerpoint/det3d/ops/dcn/build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=deform_conv_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /opt/conda/lib/python3.6/site-packages/torch/include/ATen/Parallel.h:149:0,
                 from /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:12,
                 from /opt/conda/lib/python3.6/site-packages/torch/include/torch/extension.h:4,
                 from /root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp:4:
/opt/conda/lib/python3.6/site-packages/torch/include/ATen/ParallelOpenMP.h:84:0: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
 #pragma omp parallel for if ((end - begin) >= grain_size)
 ^
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp: In function ‘void shape_check(at::Tensor, at::Tensor, at::Tensor*, at::Tensor, int, int, int, int, int, int, int, int, int, int)’:
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp:69:31: error: ‘AT_CHECK’ was not declared in this scope
            weight.ndimension());
                               ^
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp: In function ‘int deform_conv_forward_cuda(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, int)’:
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp:194:73: error: ‘AT_CHECK’ was not declared in this scope
   AT_CHECK((offset.size(0) == batchSize), "invalid batch size of offset");
                                                                         ^
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp: In function ‘int deform_conv_backward_input_cuda(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, int)’:
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp:301:76: error: ‘AT_CHECK’ was not declared in this scope
   AT_CHECK((offset.size(0) == batchSize), 3, "invalid batch size of offset");
                                                                            ^
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp: In function ‘int deform_conv_backward_parameters_cuda(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, float, int)’:
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp:417:73: error: ‘AT_CHECK’ was not declared in this scope
   AT_CHECK((offset.size(0) == batchSize), "invalid batch size of offset");
                                                                         ^
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp: In function ‘void modulated_deform_conv_cuda_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, bool)’:
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp:497:70: error: ‘AT_CHECK’ was not declared in this scope
   AT_CHECK(input.is_contiguous(), "input tensor has to be contiguous");
                                                                      ^
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp: In function ‘void modulated_deform_conv_cuda_backward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, int, int, int, int, int, int, int, bool)’:
/root/centerpoint/det3d/ops/dcn/src/deform_conv_cuda.cpp:579:70: error: ‘AT_CHECK’ was not declared in this scope
   AT_CHECK(input.is_contiguous(), "input tensor has to be contiguous");
                                                                      ^
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1522, in _run_ninja_build
    env=env)
  File "/opt/conda/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 19, in <module>
    cmdclass={'build_ext': BuildExtension})
  File "/opt/conda/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.6/distutils/command/build_ext.py", line 339, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 653, in build_extensions
    build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "/opt/conda/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/opt/conda/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
    depends=ext.depends)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 482, in unix_wrap_ninja_compile
    with_cuda=with_cuda)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1238, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1538, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
running build_ext
building 'iou3d_nms_cuda' extension
Emitting ninja build file /root/centerpoint/det3d/ops/iou3d_nms/build/temp.linux-x86_64-3.6/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
g++ -pthread -shared -B /opt/conda/compiler_compat -L/opt/conda/lib -Wl,-rpath=/opt/conda/lib -Wl,--no-as-needed -Wl,--sysroot=/ /root/centerpoint/det3d/ops/iou3d_nms/build/temp.linux-x86_64-3.6/src/iou3d_cpu.o /root/centerpoint/det3d/ops/iou3d_nms/build/temp.linux-x86_64-3.6/src/iou3d_nms_api.o /root/centerpoint/det3d/ops/iou3d_nms/build/temp.linux-x86_64-3.6/src/iou3d_nms.o /root/centerpoint/det3d/ops/iou3d_nms/build/temp.linux-x86_64-3.6/src/iou3d_nms_kernel.o -L/opt/conda/lib/python3.6/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.6/iou3d_nms_cuda.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/iou3d_nms_cuda.cpython-36m-x86_64-linux-gnu.so -> 

I don't know how to deal with it. Could you give me some ideas? Thanks!

tianweiy commented 2 years ago

hi, compiling dcn is optional (we don't use dcn in the most recent version). And it seems to only work with old torch version (e.g. 1.1).

I updated the install guide, you only need to compile the nms

https://github.com/tianweiy/CenterPoint/blob/master/docs/INSTALL.md#cuda-extensions

TianhaoFu commented 2 years ago

hi, compiling dcn is optional (we don't use dcn in the most recent version). And it seems to only work with old torch version (e.g. 1.1).

I updated the install guide, you only need to compile the nms

https://github.com/tianweiy/CenterPoint/blob/master/docs/INSTALL.md#cuda-extensions

ok, thanks!

testpku commented 2 years ago

Hi, thx for sharing your works. If I skip installing DeformConv, the above error ocurrs:

NameError: name 'DeformConv' is not defined

what should I do to skip installing DeformConv?

tianweiy commented 2 years ago

at which line of code and which config are you using?

testpku commented 2 years ago

Thanks for your prompt reply when I run commands:

cd det3d/ops/dcn python setup.py build_ext --inplace

It returns:

ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build subprocess.run( File "/root/miniconda3/envs/centerpoint/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "setup.py", line 4, in setup( File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/init.py", line 87, in setup return distutils.core.setup(**attrs) File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_commands self.run_command(cmd) File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command super().run_command(command) File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 992, in run_command cmd_obj.run() File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run self.build_extensions() File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions build_ext.build_extensions(self) File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions self._build_extensions_serial() File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial self.build_extension(ext) File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension _build_ext.build_extension(self, ext) File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension objects = self.compiler.compile( File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/root/miniconda3/envs/centerpoint/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension

When I run the following commands according to the https://github.com/tianweiy/CenterPoint/blob/master/docs/GETTING_START.md:

python tools/dist_test.py configs/nusc/voxelnet/nusc_centerpoint_voxelnet_0075voxel_dcn_flip.py --work_dir work_dirs/nusc_centerpoint_voxelnet_dcn_0075voxel_flip_testset --checkpoint work_dirs/nusc_0075_dcn_flip_track/voxelnet_converted.pth --testset --speed_test

It returns:

Use Deformable Convolution in the CenterHead! Traceback (most recent call last): File "tools/dist_test.py", line 211, in main() File "tools/dist_test.py", line 106, in main model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg) File "/root/autodl-tmp/project/CenterPoint/det3d/models/builder.py", line 50, in build_detector return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/root/autodl-tmp/project/CenterPoint/det3d/models/builder.py", line 21, in build return build_from_cfg(cfg, registry, default_args) File "/root/autodl-tmp/project/CenterPoint/det3d/utils/registry.py", line 78, in build_from_cfg return obj_cls(args) File "/root/autodl-tmp/project/CenterPoint/det3d/models/detectors/voxelnet.py", line 19, in init super(VoxelNet, self).init( File "/root/autodl-tmp/project/CenterPoint/det3d/models/detectors/single_stage.py", line 27, in init self.bbox_head = builder.build_head(bbox_head) File "/root/autodl-tmp/project/CenterPoint/det3d/models/builder.py", line 42, in build_head return build(cfg, HEADS) File "/root/autodl-tmp/project/CenterPoint/det3d/models/builder.py", line 21, in build return build_from_cfg(cfg, registry, default_args) File "/root/autodl-tmp/project/CenterPoint/det3d/utils/registry.py", line 78, in build_from_cfg return obj_cls(args) File "/root/autodl-tmp/project/CenterPoint/det3d/models/bbox_heads/center_head.py", line 231, in init DCNSepHead(share_conv_channel, num_cls, heads, bn=True, init_bias=init_bias, final_kernel=3) File "/root/autodl-tmp/project/CenterPoint/det3d/models/bbox_heads/center_head.py", line 128, in init self.feature_adapt_cls = FeatureAdaption( File "/root/autodl-tmp/project/CenterPoint/det3d/models/bbox_heads/center_head.py", line 48, in init self.conv_adaption = DeformConv( NameError: name 'DeformConv' is not defined

tianweiy commented 2 years ago

hi, please see my reply here https://github.com/tianweiy/CenterPoint/issues/362#issuecomment-1284026185

basically, dcn doesn't work with latest torch version so I suggest you to switch to non-dcn configs (which actually gets better performance). If you just need the detection file, you can refer to https://github.com/tianweiy/CenterPoint/issues/249

otherwise, just change the config name and checkpoint path to https://github.com/tianweiy/CenterPoint/tree/master/configs/nusc#voxelnet

Thanks

testpku commented 2 years ago

Good suggestions! Now I can run the eval code. Thx.