ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
294 stars 89 forks source link

[BUG] checkm dies with "OSError: AF_UNIX path too long" #288

Closed MrTomRod closed 4 months ago

MrTomRod commented 5 months ago

Sometimes PGAP fails with the following message:

++ echo Bacteria
++ tr '|' ' '
+ rank_name=Bacteria
+ set -e
+ '[' -z Bacteria ']'
+ grep -Pq 'domain\tBacteria\t' /scratch/12504851/TMPDIR/9_JF2-p1.1/wyheizmv/stgaa3d947b-701b-4ee1-a11d-86c27b1a3904/checkm/taxon_marker_sets.tsv
+ set +e
+ /root/venv/bin/checkm taxonomy_wf -t 1 -g -x fa domain Bacteria bins-prot/ taxonomy_wf-prot/
Process SyncManager-1:
Traceback (most recent call last):
  File "/opt/python-3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/python-3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/python-3.9/lib/python3.9/multiprocessing/managers.py", line 583, in _run_server
    server = cls._Server(registry, address, authkey, serializer)
  File "/opt/python-3.9/lib/python3.9/multiprocessing/managers.py", line 156, in __init__
    self.listener = Listener(address=address, backlog=16)
  File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 448, in __init__
    self._listener = SocketListener(address, family, backlog)
  File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 591, in __init__
    self._socket.bind(address)
OSError: AF_UNIX path too long
Traceback (most recent call last):
  File "/root/venv/bin/checkm", line 856, in <module>
    checkmParser.parseOptions(args)
  File "/root/venv/lib/python3.9/site-packages/checkm/main.py", line 992, in parseOptions
    self.analyze(options)
  File "/root/venv/lib/python3.9/site-packages/checkm/main.py", line 326, in analyze
    binIdToModels = mgf.find(binFiles,
  File "/root/venv/lib/python3.9/site-packages/checkm/markerGeneFinder.py", line 68, in find
    binIdToModels = mp.Manager().dict()
  File "/opt/python-3.9/lib/python3.9/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/opt/python-3.9/lib/python3.9/multiprocessing/managers.py", line 558, in start
    self._address = reader.recv()
  File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
+ taxonomy_wf_error_code=1
+ set -e
+ [[ 1 -ne 0 ]]
+ continue
+ [[ 1 -eq 0 ]]
/CERR

Error: (CFileException::eFileIO) Error opening checkm dombtblout: /scratch/12504851/TMPDIR/9_JF2-p1.1/sw_qqyqh/checkm.369149303658106304JHwAjj/fasta_by_scaffold/checkm.out;     Stack trace:;      /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/bin/checkm_wnode :0  offset=0x0 addr=0x41342d;      /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/bin/checkm_wnode :0  offset=0x0 addr=0x411423;      /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/bin/checkm_wnode :0  offset=0x0 addr=0x412346;      /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/lib/libgpxlib.so /export/home/gpipe/TeamCity/Agent3/work/427aceaa834ecbb6/ncbi_cxx/src/internal/gpipe/gpexec/queue/lib/wn_worker_thread.cpp:274 ncbi::CWorkerThread::x_DoJob(ncbi::CRef<ncbi::objects::CGPX_Job, ncbi::CObjectCounterLocker>, ncbi::CDiagContext&) offset=0x0 addr=0x7f3cd95ccf48;      /panfs/pan1.be-md.nc@@@                                                                                                                                                                               

To Reproduce I think the reason is that my environment variable TMPDIR is too long, in this case, it was set to /scratch/12504851/TMPDIR/9_JF2-p1.1

Possible fixes Maybe use a more current version of checkm? They seem aware of the issue but haven't fixed it apparently. Alternatively, pgap.py could quit if TMPDIR is too long? I think having this issue as reference might be enough.

Software versions (please complete the following information):

Workaround Use shorter TMPDIR environment variable. (Tested it, this works!)

azat-badretdin commented 5 months ago

Thanks for reporting this issue, Thomas!

Maybe use a more current version of checkm?

Alas they discontinued the incarnation of checkm we are using and they are using checkm2 - significantly different product. We have tested it and we have issues with it. So currently this is not on the list.

We will look at the long ENV var issue.

azat-badretdin commented 4 months ago

in this case, it was set to /scratch/12504851/TMPDIR/9_JF2-p1.1

That does not seem long to me. Had checkm trouble adding massive addition to that path from us, it could have been listing shorter path, but it looks like correct untruncated path is shown.

Is there something unusual about taxonomy of your input?

MrTomRod commented 4 months ago

I agree, this is not very long: /scratch/12504851/TMPDIR/.../checkm.out is only 105 characters and I tried it, longer paths work. (According to the internet, our file system xfs should support 4096 characters.)

Nevertheless, the problem was solved when I shortened the path. 🤷🏻

Taxonomy was normal and it tried a few different samples.

azat-badretdin commented 4 months ago

Nevertheless, the problem was solved when I shortened the path.

Looks like this is good enough. :-)

MrTomRod commented 4 months ago

In bioinformatics, we don't care as long as it works - somehow. :smiling_face_with_tear: