shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
54 stars 12 forks source link

mmseqs: command not found #77

Closed alanorth closed 1 year ago

alanorth commented 2 years ago

After having issues with fastp I am now able to run hecatomb test data. After a few hours the pipeline crashes:

Error in rule cluster_similar_sequences:
    jobid: 29
    output: hecatomb_out/PROCESSING/TMP/p05/A13-252-114-06_CCGTCC_R1_rep_seq.fasta, hecatomb_out/PROCESSING/TMP/p05/A13-252-114-06_CCGTCC_R1_cluster.tsv, hecatomb_out/PROCESSING/TMP/p05/A13-252-114-06_CCGTCC_R1_all_seqs.fasta
    log: hecatomb_out/STDERR/cluster_similar_sequences.A13-252-114-06_CCGTCC.log (check log file(s) for error message)
    conda-env: /var/scratch/aorth/miniconda/envs/hecatomb/snakemake/workflow/conda/897532ef0e68f9817d256b9cbd8a8111
    shell:

        mmseqs easy-linclust hecatomb_out/PROCESSING/TMP/p04/A13-252-114-06_CCGTCC_R1.all.fastq hecatomb_out/PROCESSING/TMP/p05/A13-252-11
4-06_CCGTCC_R1 hecatomb_out/PROCESSING/TMP/p05/A13-252-114-06_CCGTCC_TMP             --kmer-per-seq-scale 0.3 -c 0.8 --cov-mode 1 --min-se
q-id 0.97 --alignment-mode 3             --threads 8 &> hecatomb_out/STDERR/cluster_similar_sequences.A13-252-114-06_CCGTCC.log
        rm hecatomb_out/STDERR/cluster_similar_sequences.A13-252-114-06_CCGTCC.log

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile hecatomb_out/STDERR/cluster_similar_sequences.A13-252-114-06_CCGTCC.log:
/usr/bin/bash: line 1: mmseqs: command not found

If I manually activate the mmseq2 environment I can see that mmseq2 appears to be installed:

$ grep mmseq miniconda/envs/hecatomb/snakemake/workflow/conda/*.yaml
miniconda/envs/hecatomb/snakemake/workflow/conda/897532ef0e68f9817d256b9cbd8a8111.yaml:name: mmseqs2
miniconda/envs/hecatomb/snakemake/workflow/conda/897532ef0e68f9817d256b9cbd8a8111.yaml:    - mmseqs2=12.113e3=h2d02072_2
$ conda activate miniconda/envs/hecatomb/snakemake/workflow/conda/897532ef0e68f9817d256b9cbd8a8111
$ conda list | grep mmseq
mmseqs2                   12.113e3             h2d02072_2    bioconda

But there's no mmseq2 binary in the bin directory of that env:

$ ls miniconda/envs/hecatomb/snakemake/workflow/conda/897532ef0e68f9817d256b9cbd8a8111/bin/mmseq*
ls: cannot access miniconda/envs/hecatomb/snakemake/workflow/conda/897532ef0e68f9817d256b9cbd8a8111/bin/mmseq*: No such file or directory
beardymcjohnface commented 2 years ago

Hey, can try this with the env activated?

type -a mmseqs
alanorth commented 2 years ago

Hey @beardymcjohnface yes sure:

$ conda activate miniconda/envs/hecatomb/snakemake/workflow/conda/897532ef0e68f9817d256b9cbd8a8111
$ type -a mmseqs
-bash: type: mmseqs: not found
beardymcjohnface commented 2 years ago

That is really weird. I would try deleting the conda env and rerunning incase it was malformed.

rm -rf miniconda/envs/hecatomb/snakemake/workflow/conda/897532ef0e68f9817d256b9cbd8a8111
hecatomb run --test 

If that still doesn't work, just see if you can get mmseqs installed at all as there might be some bioconda build issue for your system.

conda create -n testmmseqs -c conda-forge -c bioconda mmseqs2=12.113e3=h2d02072_2
conda activate testmmseqs 
mmseqs --help
alanorth commented 2 years ago

It's funny, I had tried to manually install mmseq2 using the same method you described in the fastp issue, but I get dependency errors.

Here's a gist because the console log is long: https://gist.github.com/alanorth/c916e218ab1fac9fd08fe25ea527e220

Note at the bottom the dreaded "strict channel priority" again.

beardymcjohnface commented 2 years ago

What's your default channel priority? The mmseqs env should only have three bioconda deps and they're not in conda-forge, so you'd want conda-forge above bioconda. my hecatomb-mmseqs env:

$ conda list
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2022.5.18.1          ha878542_0    conda-forge
csvtk                     0.23.0               h9ee0642_0    bioconda
gawk                      5.1.0                h7f98852_0    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libidn2                   2.3.2                h7f98852_0    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
libzlib                   1.2.12               h166bdaf_0    conda-forge
mmseqs2                   12.113e3             h2d02072_2    bioconda
openssl                   3.0.3                h166bdaf_0    conda-forge
taxonkit                  0.8.0                h9ee0642_0    bioconda
wget                      1.20.3               ha35d2d1_1    conda-forge
zlib                      1.2.12               h166bdaf_0    conda-forge

this is my ~/.condarc file:

channels:
  - conda-forge
  - bioconda
  - defaults
channel_priority: strict
alanorth commented 2 years ago

My ~/.condarc didn't have any channels in it (I don't normally use Conda). After adding the channels, deleting the Conda env that hecatomb created for mmseq2 (some directory named with a long hash above), then re-running hecatomb run --test it seems to be working.

Curious why the channel list in envs/hecatomb/snakemake/workflow/envs/mmseqs2.yaml isn't good enough:

name: mmseqs2
channels:
    - conda-forge
    - bioconda
    - defaults
dependencies:
    - mmseqs2=12.113e3=h2d02072_2
    - taxonkit=0.8.0
    - csvtk=0.23.0
beardymcjohnface commented 2 years ago

I'm curious as well. I'll see if I can reproduce this weirdness on my system; it might be a snakemake or a conda issue.