stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.95k stars 377 forks source link

FileNotFoundError: [Errno 2] No such file or directory: '***/.venv/lib/python3.8/site-packages/colbert/modeling/segmented_maxsim.cpp' #268

Closed thesillypeanut closed 9 months ago

thesillypeanut commented 11 months ago

Hello, I tried pip installing this repo at the latest commit via:

pip3 install git+[https://github.com/stanford-futuredata/ColBERT.git@2479b5e4fe768eb1c95f69fa11963b0ef192d054](https://github.com/stanford-futuredata/ColBERT.git@2479b5e4fe768eb1c95f69fa11963b0ef192d054')

I see that the git repo is installing correctly when I run pip freeze:

...
colbert @ git+https://github.com/stanford-futuredata/ColBERT.git@2479b5e4fe768eb1c95f69fa11963b0ef192d054
...

However, when I try to use this package, I get the following error:

[Oct 24, 13:46:23] #> Loading collection...
0M
[Oct 24, 13:46:25] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
Traceback (most recent call last):
  ...
  File "***/.venv/lib/python3.8/site-packages/colbert/searcher.py", line 39, in __init__
    self.checkpoint = Checkpoint(self.checkpoint, colbert_config=self.config)
  File "***/.venv/lib/python3.8/site-packages/colbert/modeling/checkpoint.py", line 19, in __init__
    super().__init__(name, colbert_config)
  File "***/.venv/lib/python3.8/site-packages/colbert/modeling/colbert.py", line 23, in __init__
    ColBERT.try_load_torch_extensions(self.use_gpu)
  File "***/.venv/lib/python3.8/site-packages/colbert/modeling/colbert.py", line 38, in try_load_torch_extensions
    segmented_maxsim_cpp = load(
  File "***/.venv/lib/python3.8/site-packages/torch-2.0.1-py3.8-macosx-11.7-x86_64.egg/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "***/.venv/lib/python3.8/site-packages/torch-2.0.1-py3.8-macosx-11.7-x86_64.egg/torch/utils/cpp_extension.py", line 1468, in _jit_compile
    version = JIT_EXTENSION_VERSIONER.bump_version_if_changed(
  File "***/.venv/lib/python3.8/site-packages/torch-2.0.1-py3.8-macosx-11.7-x86_64.egg/torch/utils/_cpp_extension_versioner.py", line 45, in bump_version_if_changed
    hash_value = hash_source_files(hash_value, source_files)
  File "***/.venv/lib/python3.8/site-packages/torch-2.0.1-py3.8-macosx-11.7-x86_64.egg/torch/utils/_cpp_extension_versioner.py", line 15, in hash_source_files
    with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: '***/.venv/lib/python3.8/site-packages/colbert/modeling/segmented_maxsim.cpp'

Upon navigating to the site-packages directory path, I noticed that the file segmented_maxsim.cpp is indeed missing even though the rest of the repo code seems to exist. Is this just a pip support issue? I was trying to avoid cloning the repo and install it via pip like rest of our other project dependencies.

Thanks in advance!

Alex-S-H-P commented 9 months ago

Hi !

Got the same issue, seems to me that the problem is a missing sources=... kwarg in the setup call in setup.py. Since the pip install depends on that initial setup (whether you pip install from GitHub, or clone then install locally) requires Setuptools, this missing kwarg makes pip think that this file is junk, and, in an effort to save spaces, it does nothing with it.

Meanwhile, I'm going to add a curl/wget call when installing the repo manually, and hope that nothing else breaks

paul7Junior commented 9 months ago

Hey,

The same happen for segmented_lookup.cpp, filter_pids.cpp and decompress_residuals.cpp while trying to search.

wget or just to copy past the files for quick fix seems to work.

Thanks for Colbert+Plaid, its a killer!

Alex-S-H-P commented 9 months ago

The same happen for segmented_lookup.cpp, filter_pids.cpp and decompress_residuals.cpp while trying to search.

Yeah, I don't have the script at hand, but there are 6 cop files. Wget-ing them in a bash loop worked for me. I found them via GitHub file search.