phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
124 stars 33 forks source link

Mash sketching failing silently with conda installation #110

Closed nataquinones closed 1 year ago

nataquinones commented 2 years ago

TL;DR; I think there's a problem with mash/gsl in the conda installation. This causes mob_init to fail (silently) in the "Sketching complete plasmid database" step.

When trying to run mob_typer, I encountered the following error:

Warning! Needed database missing "/home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/databases/ncbi_plasmid_full_seqs.fas.msh"

It seems that mob_init is not creating the ncbi_plasmid_full_seqs.fas.msh file, but it doesn't report this error. When trying to run mash in the mob_suite conda created environment, it shows the following error:

mash: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory

I first solved the problem by doing a fresh mash installation in a new environment, and manually creating ncbi_plasmid_full_seqs.fas.msh with the following command:

mash sketch -p 1 -i -o ncbi_plasmid_full_seqs.fas.msh -k 21 -s 1000 ncbi_plasmid_full_seqs.fas

I then realized that downgrading gls from 2.7 to 2.6 version in the mob_suite environment also solves the problem.


Full details:

After installing mob-suite with conda in a new environment, I tried running mob_typer and I was getting the following error:

(mob_suite) $ mob_typer --infile assembly.fasta --out_file sample_mobtyper_results.txt
2022-06-03 12:23:15,315 mob_suite.mob_typer INFO: Running Mob-typer version 3.1.0 [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_typer.py:176]
2022-06-03 12:23:15,316 mob_suite.mob_typer INFO: Processing fasta file assembly.fasta [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_typer.py:178]
2022-06-03 12:23:15,318 mob_suite.mob_typer INFO: SUCCESS: Found program blastn at /home/nq10/software/miniconda3/envs/mob_suite/bin/blastn [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/utils.py:617]
2022-06-03 12:23:15,319 mob_suite.mob_typer INFO: SUCCESS: Found program makeblastdb at /home/nq10/software/miniconda3/envs/mob_suite/bin/makeblastdb [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/utils.py:617]
2022-06-03 12:23:15,320 mob_suite.mob_typer INFO: SUCCESS: Found program tblastn at /home/nq10/software/miniconda3/envs/mob_suite/bin/tblastn [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/utils.py:617]
2022-06-03 12:23:15,321 mob_suite.mob_typer INFO: Warning! Needed database missing "/home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/databases/ncbi_plasmid_full_seqs.fas.msh" [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_typer.py:323]
usage: mob_typer [-h] [-d DATABASE_DIRECTORY] [-v] [-V]
mob_typer: error: unrecognized arguments: --infile assembly.fasta --out_file sample_mobtyper_results.txt

I then tried to run mob_init, and it said it had completed successfully:

(mob_suite) $ mob_init
2022-06-03 12:15:27,775 mob_suite.utils INFO: Database directory folder already exists at /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/databases [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:131]
2022-06-03 12:15:27,777 mob_suite.utils INFO: Placed lock file at /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/databases/.lock [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:142]
2022-06-03 12:15:27,777 mob_suite.utils INFO: Initializing databases...this will take some time [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:165]
2022-06-03 12:15:27,778 mob_suite.utils INFO: Downloading databases...this will take some time [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:178]
2022-06-03 12:15:27,778 mob_suite.utils INFO: Trying mirror https://zenodo.org/record/3786915/files/data.tar.gz?download=1 [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:182]
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  450M  100  450M    0     0  8906k      0  0:00:51  0:00:51 --:--:-- 9725k
2022-06-03 12:16:19,864 mob_suite.utils INFO: Downloading databases successful, now building databases at /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/databases [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:191]
2022-06-03 12:16:19,865 mob_suite.utils INFO: Decompressing /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/databases/data.tar.gz [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:107]
2022-06-03 12:16:37,275 mob_suite.utils INFO: Building repetitive mask database [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:204]
2022-06-03 12:16:37,452 mob_suite.utils INFO: Building complete plasmid database [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:208]
2022-06-03 12:16:48,800 mob_suite.utils INFO: Sketching complete plasmid database [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:212]
2022-06-03 12:16:48,809 mob_suite.utils INFO: Init ete3 library ... [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:224]
Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...
Done. Parsing...
Loading node names...
2424484 names loaded.
277261 synonyms loaded.
Loading nodes...
2424484 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite ...
 2424000 generating entries... 
Uploading to /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite

Inserting synonyms:      275000 
Inserting taxid merges:  65000 
Inserting taxids:       2420000 
2022-06-03 12:18:57,833 mob_suite.utils INFO: Removed residual taxdump.tar.gz as ete3 is not doing proper cleaning job. [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:236]
2022-06-03 12:18:57,839 mob_suite.utils INFO: MOB init completed successfully [in /home/nq10/software/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py:250]

I tried to run mob_typer again, but kept getting the same Warning! Needed database missing error, and noticed the ncbi_plasmid_full_seqs.fas.msh was not being created.

If I try running mash in my mob_suite environment, and I get the following error.

(mob_suite) $ mash sketch
mash: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory

Let me know if there are any other details that would be helpful! It's an awesome tool! :)

kbessonov1984 commented 2 years ago

Thanks for reporting. That is a strange error. I've pulled a Singularity container that was generated by conda and mash worked correctly there. GSL is a scientific compute library in C++ that perhaps was corrupted causing mash to fail.

Will try to replicate this behaviour trying to install MOB-suite in a fresh Ubuntu 21.04 container using conda.

You can also run MOB-suite in a Singularity container on HPC cluster, but it is slower than running natively in conda but you do not need to wait for databases initialization and your runs are historically replicable.

singularity build mobsuite.sif docker://quay.io/biocontainers/mob_suite:3.1.0--pyhdfd78af_0 && singularity exec -B .:/mnt mobsuite.sif mob_recon -i /mnt/assembly.fasta -o /mnt/mob-results"

maxlcummins commented 2 years ago

Hello,

I recently encountered this same issue. As per conda env export my libgsl version is - gsl=2.7.1=h6e86dc7_1.

I downgraded to gsl version 2.5 as follows (into my mobsuite environment): mamba install gsl=2.5

The logic was 'libgsl.so.25' from the error message may be gsl version 2.5 but perhaps a different version would work also. Either way, this version seems to work fine... This made mob_typer run to completion and stopped that earlier error message coming up. Additionally mob_recon, which previously was only flagging IS elements, now correctly flags plasmids as well and gives a more extensive output.

Cheers