hjwdzh / DeepLM

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)
GNU General Public License v3.0
179 stars 20 forks source link

Run example.sh error #6

Open neilwang0913 opened 2 years ago

neilwang0913 commented 2 years ago

Hi: After running the example.sh, I got an error for no: import BACore ModuleNotFoundError: No module named 'BACore'

Any idea to fix it?

Many thanks

Penterakt commented 2 years ago

Same problem here.

sh example.sh  ✔ -- The CXX compiler identification is GNU 11.3.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /opt/cuda/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5")
-- The CUDA compiler identification is NVIDIA 11.7.99 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done /usr/lib/python3.10/site-packages/torch -- pybind11 v2.6.3 dev1 CMake Warning (dev) at /var/lib/snapd/snap/cmake/1156/share/cmake-3.24/Modules/CMakeDependentOption.cmake:89 (message): Policy CMP0127 is not set: cmake_dependent_option() supports full Condition Syntax. Run "cmake --help-policy CMP0127" for policy details. Use the cmake_policy command to set the policy and suppress this warning. Call Stack (most recent call first): 3rd/pybind11/CMakeLists.txt:98 (cmake_dependent_option) This warning is for project developers. Use -Wno-dev to suppress it.

-- Found PythonInterp: /usr/bin/python3.10 (found version "3.10.6") -- Found PythonLibs: /usr/lib/libpython3.10.so -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Configuring done CMake Warning (dev) in CMakeLists.txt: Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC, empty CUDA_ARCHITECTURES not allowed. Run "cmake --help-policy CMP0104" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

CUDA_ARCHITECTURES is empty for target "LMCoreKernel". This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) in CMakeLists.txt: Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC, empty CUDA_ARCHITECTURES not allowed. Run "cmake --help-policy CMP0104" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

CUDA_ARCHITECTURES is empty for target "LMCoreKernel". This warning is for project developers. Use -Wno-dev to suppress it.

-- Generating done -- Build files have been written to: /home/yunnan/repos/DeepLM/build [ 10%] Building CUDA object CMakeFiles/LMCoreKernel.dir/TorchLM/cpp/kernel_impl.cu.o [ 20%] Building CXX object CMakeFiles/BACore.dir/BAProblem/cpp/baproblem_manager.cc.o [ 30%] Building CXX object CMakeFiles/BACore.dir/BAProblem/cpp/interface.cc.o [ 40%] Building CXX object CMakeFiles/BACore.dir/BAProblem/cpp/io.cc.o [ 50%] Building CXX object CMakeFiles/BACore.dir/BAProblem/cpp/torch_util.cc.o [ 60%] Linking CUDA shared library libLMCoreKernel.so [ 60%] Built target LMCoreKernel [ 80%] Building CXX object CMakeFiles/LMCore.dir/TorchLM/cpp/kernel.cc.o [ 80%] Building CXX object CMakeFiles/LMCore.dir/TorchLM/cpp/interface.cc.o /home/yunnan/repos/DeepLM/BAProblem/cpp/baproblem_manager.cc: In function ‘std::vector<std::vector > PrepareSeparator(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, at::Tensor&, long int)’: /home/yunnan/repos/DeepLM/BAProblem/cpp/baproblem_manager.cc:118:28: warning: comparison of integer expressions of different signedness: ‘long int’ and ‘std::vector<std::vector >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 118 | for (long i = 0; i < originPtIndices.size(); ++i) { | ^~~~~~ /home/yunnan/repos/DeepLM/BAProblem/cpp/baproblem_manager.cc:64:21: warning: unused variable ‘dPtIdx’ [-Wunused-variable] 64 | const long dPtIdx = static_cast<const long>(pointIdx.storage().data()); | ^~ /home/yunnan/repos/DeepLM/BAProblem/cpp/baproblem_manager.cc:117:14: warning: variable ‘intOptions’ set but not used [-Wunused-but-set-variable] 117 | auto intOptions = torch::TensorOptions().dtype(torch::kInt64); | ^~~~~~ /home/yunnan/repos/DeepLM/BAProblem/cpp/io.cc: In function ‘std::vector LoadBALFromFile(const char, int, int, int)’: /home/yunnan/repos/DeepLM/BAProblem/cpp/io.cc:44:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<double, std::allocator >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 44 | for (int i = 0; i < cameraParameters.size(); ++i) | ^~~~~~~ /home/yunnan/repos/DeepLM/BAProblem/cpp/io.cc:49:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<double, std::allocator >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 49 | for (int i = 0; i < points3d.size(); ++i) { | ^~~~~~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc: In function ‘void JacobiColumnSquare(const std::vector&, const std::vector&, std::vector&, int)’: /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:261:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 261 | for (int i = 0; i < jacobians.size(); ++i) { | ^~~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:258:13: warning: unused variable ‘residualDim’ [-Wunused-variable] 258 | int residualDim = indices.size(); | ^~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc: In function ‘void ColumnInverseSquare(std::vector&)’: /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:301:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 301 | for (int i = 0; i < jacobianScale.size(); ++i) { | ^~~~~~~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc: In function ‘void JacobiNormalize(const std::vector&, const std::vector&, std::vector&)’: /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:327:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 327 | for (int i = 0; i < jacobianScale.size(); ++i) { | ^~~~~~~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:335:21: warning: unused variable ‘num’ [-Wunused-variable] 335 | int num = numDimV numDimP; | ^~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:325:13: warning: unused variable ‘residualDim’ [-Wunused-variable] 325 | int residualDim = indices.size(); | ^~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc: In function ‘void JacobiLeftMultiplyCuda(const std::vector&, const at::Tensor&, const std::vector&, const at::Tensor&, std::vector&, int)’: /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:408:21: warning: unused variable ‘numDimV’ [-Wunused-variable] 408 | int numDimV = jtr.size(0); | ^~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc: In function ‘void JacobiColumnSquareCuda(const std::vector&, const std::vector&, std::vector&, int)’: /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:481:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 481 | for (int i = 0; i < jacobians.size(); ++i) { | ^~~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc: In function ‘void ColumnInverseSquareCuda(std::vector&)’: /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:499:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 499 | for (int i = 0; i < jacobianScale.size(); ++i) { | ^~~~~~~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc: In function ‘void JacobiNormalizeCuda(const std::vector&, const std::vector&, std::vector&)’: /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:516:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] 516 | for (int i = 0; i < jacobianScale.size(); ++i) { | ^~~~~~~~ /home/yunnan/repos/DeepLM/TorchLM/cpp/kernel.cc:524:21: warning: unused variable ‘num’ [-Wunused-variable] 524 | int num = numDimV * numDimP; | ^~~ [ 90%] Linking CXX shared library BACore.cpython-310-x86_64-linux-gnu.so [ 90%] Built target BACore [100%] Linking CXX shared library LMCore.cpython-310-x86_64-linux-gnu.so [100%] Built target LMCore --2022-10-05 19:54:22-- https://grail.cs.washington.edu/projects/bal/data/ladybug/problem-49-7776-pre.txt.bz2 Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt' Resolving grail.cs.washington.edu (grail.cs.washington.edu)... 2607:4000:200:14::5d, 128.208.5.93 Connecting to grail.cs.washington.edu (grail.cs.washington.edu)|2607:4000:200:14::5d|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 448484 (438K) [application/x-bzip2] Saving to: ‘problem-49-7776-pre.txt.bz2’

problem-49-7776-pre 100%[===================>] 437.97K 211KB/s in 2.1s

2022-10-05 19:54:26 (211 KB/s) - ‘problem-49-7776-pre.txt.bz2’ saved [448484/448484]

Traceback (most recent call last): File "/home/yunnan/repos/DeepLM/examples/BundleAdjuster/bundle_adjuster.py", line 5, in import BACore ImportError: /home/yunnan/repos/DeepLM/build/BACore.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs

hjwdzh commented 2 years ago

I did not run into this error. I googled it and some one offers the solution as adding "--use-cxx11-abi", with a reference link: https://pytorch.org/TensorRT/tutorials/installation.html#installation (choosing the right ABI).

Let me know if it helps :)

Penterakt commented 2 years ago

In the end I installed conda environment from https://github.com/CompVis/stable-diffusion, and then it magically worked.

dennisushi commented 1 year ago

I had to export the following:

export TCNN_CUDA_ARCHITECTURE=86
export CUDA_HOME="/usr/local/cuda-11.7"
export PATH="/usr/local/cuda-11.7/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH"

before example.sh

It solved the

import BACore
ModuleNotFoundError: No module named 'BACore'

problem, but then I got

Traceback (most recent call last):
  File "examples/BundleAdjuster/bundle_adjuster.py", line 8, in <module>
    from BAProblem.rotation import AngleAxisRotatePoint
ModuleNotFoundError: No module named 'BAProblem'

Dirty fix: add sys.path.append(os.path.dirname(os.path.dirname(__file__))) to the example script.

jucamohedano commented 1 year ago

Hi! I'm also running into an error when running the example script, and it's different from the errors in previous issues. Any suggestions of what's wrong would be very much appreciated.

TORCH_USE_RTLD_GLOBAL=YES python3 examples/BundleAdjuster/bundle_adjuster.py --balFile ./data/problem-49-7776-pre.txt --device cuda
Load observation 31000 of 31843...       
Initial cost = 8.509125E+05, Memory = 2.321921E-03 G
Traceback (most recent call last):
  File "examples/BundleAdjuster/bundle_adjuster.py", line 40, in <module>
    numSuccessIterations = 15)
  File "/home/juanmohedano/OnePose_Plus_Plus/submodules/DeepLM/TorchLM/solver.py", line 726, in Solve
    solver.Solve()
  File "/home/juanmohedano/OnePose_Plus_Plus/submodules/DeepLM/TorchLM/solver.py", line 617, in Solve
    self.ComputeTrustRegionStep()
  File "/home/juanmohedano/OnePose_Plus_Plus/submodules/DeepLM/TorchLM/solver.py", line 560, in ComputeTrustRegionStep
    step = self.LinearSolve(lmDiagonal);
  File "/home/juanmohedano/OnePose_Plus_Plus/submodules/DeepLM/TorchLM/solver.py", line 428, in LinearSolve
    ListInvert(preconditioner)
  File "/home/juanmohedano/OnePose_Plus_Plus/submodules/DeepLM/TorchLM/listvec.py", line 25, in ListInvert
    listvec[i] = torch.inverse(listvec[i])
torch._C._LinAlgError: linalg.inv: (Batch element 0): The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.
brian2lee commented 1 year ago

I did not run into this error. I googled it and some one offers the solution as adding "--use-cxx11-abi", with a reference link: https://pytorch.org/TensorRT/tutorials/installation.html#installation (choosing the right ABI).

Let me know if it helps :)

&

I had to export the following:

export TCNN_CUDA_ARCHITECTURE=86
export CUDA_HOME="/usr/local/cuda-11.7"
export PATH="/usr/local/cuda-11.7/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH"

before example.sh

It solved the

import BACore
ModuleNotFoundError: No module named 'BACore'

problem, but then I got

Traceback (most recent call last):
  File "examples/BundleAdjuster/bundle_adjuster.py", line 8, in <module>
    from BAProblem.rotation import AngleAxisRotatePoint
ModuleNotFoundError: No module named 'BAProblem'

Dirty fix: add sys.path.append(os.path.dirname(os.path.dirname(__file__))) to the example script.

Does not work for me, any additional solving method? I ran into so many issues building oneposeplus

zym-njust commented 1 year ago

I ran into the same error. I try to install BACore package but I can't find the source.

zym-njust commented 1 year ago

I finally found that I didn't download the 3rd party file 'eigen' and 'pybind11', which is not included in this ZIP. You have to download it manually and it works.

VVinter-melon commented 1 year ago

Dirty fix: add `sys.path.append(os.path.dirname(os.path.dirname(__file__)))` to the example script.

This helped, by changing the last line of example.sh to : TORCH_USE_RTLD_GLOBAL=YES python3 ../DeepLM/examples/BundleAdjuster/bundle_adjuster.py --balFile ./data/problem-49-7776-pre.txt --device cuda and adding sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__)))) to the bundle_adjuster.py file

shivakarnati commented 8 months ago

None of the answers worked for me. from BAProblem.rotation import AngleAxisRotatePoint getting this error. Did anyone solve this?

dadwadw233 commented 7 months ago

I met the same proplem and i finally solved it by setting the python version correctly which can be edit in the CMakeLists.txt like this 👇

set(PYTHOH3_VERSION 3.9m)

after setting the python version to 3.9 and run the command sh example.sh, i find the correct python lib was compiled successfully named 'BACore.cpython-39-x86_64-linux-gnu.so' (which is BACore.cpython-38-x86_64-linux-gnu.so before) and this solve the problem of ImportError: xxx. @shivakarnati

dadwadw233 commented 7 months ago

i guess @Penterakt can solve the problem by installing SD's conda env mainly because the python version of it is 3.8.5 , which is compatible for the py38 version xxx.so

titanior commented 2 months ago

Please make sure the right cuda version is installed