stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
3.01k stars 381 forks source link

ninja: build stopped: subcommand failed. #371

Open bhargav25dave1996 opened 1 week ago

bhargav25dave1996 commented 1 week ago

Running this ColBERT code:

    config = ColBERTConfig(

        nbits=2,
        root="experiments",
    )
    indexer = Indexer(checkpoint="/media/sda1/Bhargav/indiccolbert/guj_Gujr-nllb1.3b-moses/colbert-50000", config=config)
    indexer.index(name="gu_fire.nbits=2", collection="/media/sda1/Bhargav/FIRE_adhoc_data/Gujarati/Gujarati_collection_only_index.tsv")

Gives me this error:

Process Process-2: Traceback (most recent call last): File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build subprocess.run( File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/infra/launcher.py", line 134, in setup_new_process return_val = callee(config, args) File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/collection_indexer.py", line 33, in encode encoder.run(shared_lists) File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/collection_indexer.py", line 68, in run self.train(shared_lists) # Trains centroids from selected passages File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/collection_indexer.py", line 237, in train bucket_cutoffs, bucket_weights, avg_residual = self._compute_avg_residual(centroids, heldout) File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/collection_indexer.py", line 315, in _compute_avg_residual compressor = ResidualCodec(config=self.config, centroids=centroids, avg_residual=None) File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/residual.py", line 24, in init ResidualCodec.try_load_torch_extensions(self.use_gpu) File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/residual.py", line 103, in try_load_torch_extensions decompress_residuals_cpp = load( File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile _write_ninja_file_and_build_library( File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'decompress_residuals_cpp': [1/3] c++ -MMD -MF decompress_residuals.o.d -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/TH -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/THC -isystem /home/irlab/miniconda3/envs/colbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cpp -o decompress_residuals.o FAILED: decompress_residuals.o c++ -MMD -MF decompress_residuals.o.d -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/TH -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/THC -isystem /home/irlab/miniconda3/envs/colbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cpp -o decompress_residuals.o In file included from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:12, from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:4, from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8, from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/extension.h:6, from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cpp:1: /home/irlab/miniconda3/envs/colbert/include/python3.8/Python.h:44:10: fatal error: crypt.h: No such file or directory 44 | #include | ^~~~~ compilation terminated. [2/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/TH -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/THC -isystem /home/irlab/miniconda3/envs/colbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu -o decompress_residuals.cuda.o FAILED: decompress_residuals.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/TH -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/THC -isystem /home/irlab/miniconda3/envs/colbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu -o decompress_residuals.cuda.o /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type::type>::cast_op_type pybind11::detail::cast_op(make_caster&)’: /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h:42:120: error: expected template-name before ‘<’ token 42 | return caster.operator typename make_caster::template cast_op_type(); | ^ /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h:42:120: error: expected identifier before ‘<’ token /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h:42:123: error: expected primary-expression before ‘>’ token 42 | return caster.operator typename make_caster::template cast_op_type(); | ^ /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h:42:126: error: expected primary-expression before ‘)’ token 42 | return caster.operator typename make_caster::template cast_op_type(); | ^ /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu: In function ‘at::Tensor decompress_residuals_cuda(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int)’: /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu:61:126: warning: ‘T at::Tensor::data() const [with T = unsigned char]’ is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations] 61 | decompress_residuals_kernel<<<blocks, threads>>>( | ^ /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:238:1: note: declared here 238 | T data() const { | ^ ~~ /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu:61:592: warning: ‘T at::Tensor::data() const [with T = c10::Half]’ is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations] 61 | decompress_residuals_kernel<<<blocks, threads>>>( | ^ /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:238:1: note: declared here 238 | T data() const { | ^ ~~ ninja: build stopped: subcommand failed.

bhargav25dave1996 commented 1 week ago

@okhat

Liu-Eroteme commented 5 hours ago

Okay.. so I don't know what actually happened, but i got the same error today - and it turned out to be a torch fuckup.. the recent update broke some symlinks so fixing it was as easy as:

cd ...site-packages/torch/lib

ln -s ../../../../libtorch_python.so libtorch_python.so

.. tho, on second glance, your stack trace is a little different.. might be something else, but i'd still check torch and the torch extension loader, its always f*ing torch.