padlocbio / padloc

Locate antiviral defence systems in prokaryotic genomes
MIT License
45 stars 9 forks source link

Nothing found for GCF_004358345.1 on new conda install of padloc #34

Closed wsowens closed 11 months ago

wsowens commented 11 months ago

Issue description:

Hi there, thanks for making PADLOC, both the command line tool and the web app have been immensely helpful to me!

I'm encountering a problem with a new install of PADLOC in a conda environment based on the instructions in the README. Then I ran padloc on one of the provided test files and got "Nothing found", resulting in no _padloc.csv or _padloc.gff being produced.

Every .fna file I've tried successfully produces a .domtblout file but no _padloc.csv or _padloc.gff files. The same result also occurs when I run padloc with the other provided test files, GCF_001688665.2.faa and GCF_001688665.2.gff

Reproducible example:

conda create -n padloc -c conda-forge -c bioconda -c padlocbio padloc
conda activate padloc
padloc --db-update

padloc --version # padloc v1.1.0
padloc --db-version # padloc-db v2.0.0

padloc --debug --fna ~/.conda/envs/padloc/test/GCF_004358345.1.fna 2>&1 | tee padloc_test.log

Debug output:

from grep ">>" padloc_test.log (see below for full output)

[13:26:42] DEBUG >> $FAA_FILE: 
[13:26:42] DEBUG >> $FNA_FILE: /home/myusername/.conda/envs/padloc/test/GCF_004358345.1.fna
[13:26:42] DEBUG >> $GFF_FILE: 
[13:26:42] DEBUG >> $HMM_DATABASE: /home/myusername/.conda/envs/padloc/data/hmm/padlocdb.hmm
[13:26:42] DEBUG >> $YAML_DIR: /home/myusername/.conda/envs/padloc/data/sys/
[13:26:42] DEBUG >> $HMM_META: /home/myusername/.conda/envs/padloc/data/hmm_meta.txt
[13:26:42] DEBUG >> $SYS_META: /home/myusername/.conda/envs/padloc/data/sys_meta.txt
[13:26:42] DEBUG >> $OUT_DIR: .
[13:26:42] DEBUG >> $CPU: 1
[13:26:42] >> Predicting protein-coding genes with prodigal
[13:26:42] DEBUG >> prodigal -i /home/myusername/.conda/envs/padloc/test/GCF_004358345.1.fna -f gff -o ./GCF_004358345.1_prodigal.gff -a ./GCF_004358345.1_prodigal.faa -q
[13:26:46] DEBUG >> Finished @ Sat Nov  4 01:26:46 PM EDT 2023 ... SUCCESSFUL
[13:26:46] >> Scanning GCF_004358345.1 for defence system proteins
[13:26:46] DEBUG >> hmmsearch --cpu 1 --acc --noali --domtblout ./GCF_004358345.1.domtblout /home/myusername/.conda/envs/padloc/data/hmm/padlocdb.hmm ./GCF_004358345.1_prodigal.faa
[13:29:16] DEBUG >> Finished @ Sat Nov  4 01:29:16 PM EDT 2023 ... SUCCESSFUL
[13:29:16] >> Searching GCF_004358345.1 for defence systems
[13:29:16] DEBUG >> Rscript /home/myusername/.conda/envs/padloc/bin/bin/padloc.R -d ./GCF_004358345.1.domtblout -f ./GCF_004358345.1_prodigal.gff -h /home/myusername/.conda/envs/padloc/data/hmm_meta.txt -s /home/myusername/.conda/envs/padloc/data/sys_meta.txt -y /home/myusername/.conda/envs/padloc/data/sys/ -o . -b 1 -q 0 -p 1
[01:29:17 PM] DEBUG >> Start time: 2023-11-04 13:29:17.123566
[01:29:17 PM] DEBUG >> Reading hmm_meta.txt
[01:29:17 PM] WARNING >> Some rows of hmm_meta.txt are missing values in required columns (e.val.threshold, hmm.coverage.threshold, target.coverage.threshold)
[01:29:17 PM] WARNING >> These columns will be filled with default values, respectively: 1E-05, 0.3, 0.3
[01:29:17 PM] DEBUG >> Reading sys_meta.txt
[01:29:17 PM] DEBUG >> Reading GCF_004358345.1.domtblout
[01:29:18 PM] DEBUG >> Reading GCF_004358345.1_prodigal.gff
[01:29:21 PM] DEBUG >> Merging domain, alias, and feature tables
[01:29:21 PM] DEBUG >> Searching for defence systems
[01:29:21 PM] >> Nothing found for GCF_004358345.1
[01:29:21 PM] DEBUG >> End time: 2023-11-04 13:29:21.478621
[01:29:21 PM] DEBUG >> Run time: 4.35505533218384

Attached file:

padloc_test.log.gz

leightonpayne commented 11 months ago

Hi @wsowens,

Thanks for the detailed info! Your issue is probably due to incompatibility between padloc v1.1.0 and padloc-db v2.0.0 (only padloc v2.0.0 is compatible with padloc-db v2.0.0).

I'm not sure why, but for a couple of people, conda is not installing the latest padloc release by default. You can try to specify padloc v2.0.0 by running:

conda create -n padloc -c conda-forge -c bioconda -c padlocbio padloc=2.0.0

Do you mind posting the output of conda info and your .condarc (if you have one) so I can try and diagnose this further?

wsowens commented 11 months ago

Hi @leightonpayne, thanks for getting back to me!

Sure thing, here's conda info


     active environment : padloc
    active env location : /home/myusername/.conda/envs/padloc
            shell level : 1
       user config file : /home/myusername/.condarc
 populated config files : /usr/share/conda/condarc.d/defaults.yaml
                          /home/myusername/.condarc
          conda version : 4.13.0
    conda-build version : not installed
         python version : 3.11.4.final.0
       virtual packages : __cuda=12.2=0
                          __linux=6.4.12=0
                          __glibc=2.36=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /usr  (read only)
      conda av data dir : /usr/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /var/cache/conda/pkgs
                          /home/myusername/.conda/pkgs
       envs directories : /home/myusername/.conda/envs
                          /usr/envs
               platform : linux-64
             user-agent : conda/4.13.0 requests/2.28.1 CPython/3.11.4 Linux/6.4.12-100.fc37.x86_64 fedora/37 glibc/2.36
                UID:GID : 1000:1000
             netrc file : /home/myusername/.netrc
           offline mode : False

And here's cat ~/.condrc

report_errors: false
changeps1: false

I didn't have conda installed on this machine (Fedora 37), so I installed it via the package manager dnf install conda and then made the above tweaks just to avoid changing my terminal prompt.

wsowens commented 11 months ago

Just confirmed, after creating a new conda environment with padloc=2.0.0, I do successfully get _padloc.csv and _padloc.gff files.

I'm happy to provide any more information to help determine why conda defaulted to padloc v1.1.0, but otherwise my issue is solved. Thank you!

leightonpayne commented 11 months ago

Thanks,

I suspect this issue has something to do with older versions of conda using the pycosat solver for dependency handling, possibly failing to solve the dependencies correctly and falling back to padloc v1.1.0 by default?

I was able to reproduce the issue using the Linux x86 version of conda v4.13.0 on Red Hat Enterprise Linux 8.6:

wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -u -p ~/.conda
conda activate base
conda install conda=4.13.0
conda create -n padloc -c conda-forge -c bioconda -c padlocbio padloc
conda activate padloc
padloc --version
# padloc v1.1.0

Interestingly, the macOS x86 version of conda v4.13.0 appears to install padloc v2.0.0 by default 🤷🏼. The current (latest) version of conda v23.9.0 installs padloc v2.0.0 by default (as expected) on Linux and macOS.

[!IMPORTANT] If anyone else stumbles across this thread, I recommend specifying padloc v2.0.0 during installation (this should work regardless of conda version)

create -n padloc -c conda-forge -c bioconda -c padlocbio padloc=2.0.0

[!NOTE] To avoid future issues, you may also wish to:

  1. Make sure your conda installation is up-to-date
    conda install -n base -c defaults 'conda>=23.9.0'
  2. Make sure the new libmamba solver is available and active
    conda install -n base conda-libmamba-solver --solver=classic
    conda config --set solver libmamba

I've also dumped these instructions in a separate issue (#35)