ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

[BUG]: fcsadaptor gives errors related to psutil library #61

Closed LiaOb21 closed 2 months ago

LiaOb21 commented 5 months ago

Hi, thank you for developing these tools! I suppose the errors I'm encountering are related to my local installation, but since I obtain the output anyway, I would like to understand if these are critical errors or not and if the output is reliable despite the errors.

Describe the bug Running fcsadaptor gives several error messages related to psutil library, also when using test files. Errors are like the following:

Exception in thread Thread-7:
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_xmitx75s/runfiles/pip_deps_pypi__psutil_5_8_0/psutil/_common.py", line 447, in wrapper
    ret = self._cache[fun]
AttributeError: 'Process' object has no attribute '_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_xmitx75s/runfiles/pip_deps_pypi__psutil_5_8_0/psutil/_pslinux.py", line 1576, in wrapper
    return fun(self, *args, **kwargs)
  File "/tmp/Bazel.runfiles_xmitx75s/runfiles/pip_deps_pypi__psutil_5_8_0/psutil/_pslinux.py", line 1810, in memory_info
    with open_binary("%s/%s/statm" % (self._procfs_path, self.pid)) as f:
  File "/tmp/Bazel.runfiles_xmitx75s/runfiles/pip_deps_pypi__psutil_5_8_0/psutil/_common.py", line 711, in open_binary
    return open(fname, "rb", **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/1540326/statm'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 1266, in run
    self.function(*self.args, **self.kwargs)
  File "/tmp/Bazel.runfiles_xmitx75s/runfiles/pip_deps_pypi__cwltool_3_1_20211107152837/cwltool/job.py", line 507, in get_tree_mem_usage
    rss = monitor.memory_info().rss
  File "/tmp/Bazel.runfiles_xmitx75s/runfiles/pip_deps_pypi__psutil_5_8_0/psutil/_common.py", line 450, in wrapper
    return fun(self)
  File "/tmp/Bazel.runfiles_xmitx75s/runfiles/pip_deps_pypi__psutil_5_8_0/psutil/__init__.py", line 1054, in memory_info
    return self._proc.memory_info()
  File "/tmp/Bazel.runfiles_xmitx75s/runfiles/pip_deps_pypi__psutil_5_8_0/psutil/_pslinux.py", line 1583, in wrapper
    raise NoSuchProcess(self.pid, self._name)
psutil.NoSuchProcess: psutil.NoSuchProcess process no longer exists (pid=1540326)

To Reproduce The commands used are exactly those described here for singularity: https://github.com/ncbi/fcs/wiki/FCS-adaptor

Software versions (please complete the following information):

Log Files

./run_fcsadaptor.sh --fasta-input ./inputdir/fcsadaptor_prok_test.fa.gz --output-dir ./outputdir --prok --container-engine singularity --image fcs-adaptor.sif --debug
invalid option : '--debug'

Additional context My fcs_adaptor_report.txt looks like this:

#accession      length  action  range   name
KPN158_ctg010   251170  ACTION_TRIM     1..27   CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB01113.1:Illumina TruSeq UD/CD Adapter Trimming Read 1
KPN158_ctg041   1411    ACTION_TRIM     81..150 CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB00753.1:Illumina TruSeq DNA HT and RNA HT i5 index D507 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.)
KPN158_ctg047   1099    ACTION_TRIM     1047..1099      CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB01092.1:Rubicon Genomics ThruPLEX DNA-seq dual-index D504
KPN158_ctg049   1061    ACTION_TRIM     1..67   CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB00753.1:Illumina TruSeq DNA HT and RNA HT i5 index D507 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.)

In fcs_adaptor.log (attached) all the steps seem to be completed successfully. When I run fcsadaptor on my assembly I get an empty report, but I suppose it's because the adaptors have been previously removed from the assembly. Is this correct?

Thank you so much in advance for your help.

Lia fcs_adaptor_report.txt

etvedte commented 5 months ago

Hello,

Do you mind attaching the fcs_adaptor.log files? You actually attached the adaptor report.

Can you try updating your psutil installation and try again?

LiaOb21 commented 5 months ago

Hi,

Sorry for the mistake! Here is the log file. However, these errors are not recorded in the log file.

I am working on a cluster where I don't have root privileges. psutil is not installed on the system, and I've realized that my installation (despite being the latest) is located in my Conda base environment. Could this be causing issues, especially considering that fcsadaptor runs with Singularity?

The log file indicates that the workflow completed successfully, which leaves me uncertain about the criticality of these errors.

Thank you again! fcs_adaptor.log

etvedte commented 5 months ago

It appears we were mistaken that psutil could be updated on the user end and that the version used is the one distributed in the container. But it looks like the library is used to monitor for maximum memory usage, for logging purposes. This might explain it not being a source of catastrophic failure. An update to psutil distributed in the container on our end in the might resolve this issue.

We'll keep this issue open for now. In any case, it looks like you reproduced the positive control in the example FASTA, so I would say you're fine to proceed in its current state. If you wanted an additional sanity check, you could try spiking in a PacBio SMRTbell in your own sequence:

>Pacific Biosciences Blunt Adapter
ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT 
etvedte commented 2 months ago

This should be resolved.