jolespin / veba

A modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes
GNU Affero General Public License v3.0
77 stars 9 forks source link

[Bug] OSError: AF_UNIX path too long #121

Open zackhenny opened 3 days ago

zackhenny commented 3 days ago

Describe the bug: Hello, I've ran into this error during the first iteration of binning in the prokaryotic_binning module. Essentially, it creates bins in the first iteration but this checkM2 step fails in the manner below. The pipeline continues to run but all iterations of binning are empty, due to not being able to find the 1-unbinned.fa files. Eventually the pipeline fails on the final step because the there is no genomes to parse. I've tracked the logs down to the checkM step, which I've provided below. Is there a workaround for this? it appears to be something internal with the threading that python is attempting and UNIX domain sockets (?), thank you!

Versions 2.2.1

Log file for checkM2 below.

[09/15/2024 02:38:30 AM] INFO: Custom database path provided for predict run. Checking database at /project/thrash_89/db/VEBA-database/Classify/CheckM2/uniref100.KO.1.dmnd...
[09/15/2024 02:38:37 AM] INFO: Running quality prediction workflow with 32 threads.
[09/15/2024 02:38:39 AM] INFO: Using user-supplied protein files.
[09/15/2024 02:38:40 AM] INFO: Calculating metadata for 31 bins with 32 threads:
Process SyncManager-1:
Traceback (most recent call last):
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/managers.py", line 608, in _run_server
    server = cls._Server(registry, address, authkey, serializer)
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/managers.py", line 154, in __init__
    self.listener = Listener(address=address, backlog=16)
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/connection.py", line 448, in __init__
    self._listener = SocketListener(address, family, backlog)
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/connection.py", line 591, in __init__
    self._socket.bind(address)
OSError: AF_UNIX path too long
Traceback (most recent call last):
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/bin/checkm2", line 242, in <module>
    predictor.prediction_wf(args.genes, mode, args.dbg_cos, args.dbg_vectors,
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/site-packages/checkm2/predictQuality.py", line 116, in prediction_wf
    metadata_df = self.__calculate_metadata(prodigal_files)
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/site-packages/checkm2/predictQuality.py", line 432, in __calculate_metadata
    metadata_dict = mp.Manager().dict()
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/managers.py", line 583, in start
    self._address = reader.recv()
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/home1/zjhennin/miniconda3/envs/VEBA-binning-prokaryotic_env/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
jolespin commented 3 days ago

Yes this an annoying issue that used to arise in earlier version and a bit tricky to work around if the following doesn't work. Can you try setting the --tmpdir ./tmp/ (that is tmp/ in your cwd)? I remember this working for me in the past when I got this error which is why I created custom temporary working directories. I'm traveling abroad right now so I can't test myself since I purposefully didn't bring my computer for a proper holiday to reset.

I had this previously in the FAQ for a very old version but looks like I removed it because I thought the issue was resolved.

Any more context on the full command and logs for VEBA (not just checkm2) would be helpful if you have them available.

I'll be back in office next week so I can take a look if you can provide some more context.