stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.82k stars 369 forks source link

Indexer unable to index with cuda #221

Open bohanhou14 opened 1 year ago

bohanhou14 commented 1 year ago

Hi, I am trying to run Colbert's demo notebook with 1 gpu on a remote machine. I sbatched a job with gpu (confirmed that it was allocated) and got the following error:

Traceback (most recent call last): File "", line 1, in File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/gridsan/jzhang2/bhou4/ColBERT/colbert/test.py", line 38, in indexer.index(name=index_name, collection=collection, overwrite=True) File "/home/gridsan/jzhang2/bhou4/ColBERT/colbert/../colbert/indexer.py", line 74, in index self.launch(collection) File "/home/gridsan/jzhang2/bhou4/ColBERT/colbert/../colbert/indexer.py", line 79, in launch manager = mp.Manager() File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/context.py", line 57, in Manager m.start() File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/managers.py", line 579, in start self._process.start() File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch prep_data = spawn.get_preparation_data(process_obj._name) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last): File "test.py", line 38, in indexer.index(name=index_name, collection=collection, overwrite=True) File "/home/gridsan/jzhang2/bhou4/ColBERT/colbert/../colbert/indexer.py", line 74, in index self.launch(collection) File "/home/gridsan/jzhang2/bhou4/ColBERT/colbert/../colbert/indexer.py", line 79, in launch manager = mp.Manager() File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/context.py", line 57, in Manager m.start() File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/managers.py", line 583, in start self._address = reader.recv() File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/gridsan/jzhang2/.conda/envs/colbert/lib/python3.8/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

The relevant parts of my code:

Screenshot 2023-07-07 at 4 25 09 PM

Thanks in advance!!

okhat commented 1 year ago

You need an if statement that checks for __main__