merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
439 stars 145 forks source link

[BUG] cannot import vbgmm in concoct #2330

Closed NebulaSL closed 1 month ago

NebulaSL commented 2 months ago

Short description of the problem

Hi Anvio team! When i tried to run anvi-cluster-contig using concoct, it gave an error about import vbgmm module, we had a installed concoct within the conda enviornment, it seems anvio is look for concoct in the system, is there a way to change it?

The thing is our server admin can install things within conda environrment or some other environment however cannot install it on the system.

anvi'o version

anvi-self-test --version Anvi'o .......................................: marie (v8) Python .......................................: 3.10.14

Profile database .............................: 38 Contigs database .............................: 21 Pan database .................................: 16 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2

System info

linux system on the server, our admins did it

Detailed description of the issue

Hi, when I tried to run concoct to create metabins for my large metagenome file, using

export TMPDIR=/scratch

source /usr/local/bioinfo/Anaconda3-23/bin/activate
source activate /usr/local/bioinfo/Anaconda3-23/envs/anvio-8

python -c "import tempfile; print(tempfile.gettempdir())"

chmod -R 777 /scratch/shiyiliu/sp2_110Gb/06_anvio/01_db_create

contigs_db="/scratch/shiyiliu/sp2_110Gb/06_anvio/01_db_create/contigs_sp2_202401_k127.db"
profile_db_dir="/scratch/shiyiliu/sp2_110Gb/06_anvio/02_bam_profile_doesnotincludetax/contig_2401/merged"
sample_name="contigs_sp2_202401_k127"
outputdir="/scratch/shiyiliu/sp2_110Gb/06_anvio/04_concoct_metabin/contig_2401"
mkdir -p $outputdir
cd $outputdir

anvi-cluster-contigs      -c $contigs_db \
                          -p $profile_db_dir/PROFILE.db \
                          --driver concoct \
                          -C CONCOCT \
                          --clusters 200 \
                          -T 4 \
                          --just-do-it

it gave an error:

WARNING

You are running an experimental workflow not every part of which may be fully
and thoroughly tested :) Please scrutinize your output carefully after analysis,
and keep us posted if you see things that surprise you.

Contigs DB ...................................: /scratch/shiyiliu/sp2_110Gb/06_anvio/01_db_create/contigs_sp2_202401_k127.db
Profile DB ...................................: /scratch/shiyiliu/sp2_110Gb/06_anvio/02_bam_profile_doesnotincludetax/contig_2401/merged/PROFILE.db
Binning module ...............................: CONCOCT
Cluster type .................................: contig
Working directory ............................: /scratch/tmprhjyf2u0

CITATION

Anvi'o is now passing all your data to the binning module 'CONCOCT'. If you
publish results from this workflow, please do not forget to reference the
following citation.

* Johannes Alneberg, Brynjar Smári Bjarnason, Ino de Bruijn, Melanie Schirmer,
  Joshua Quick, Umer Z Ijaz, Leo Lahti, Nicholas J Loman, Anders F Andersson &
  Christopher Quince. 2014. Binning metagenomic contigs by coverage and
  composition. Nature Methods, doi: 10.1038/nmeth.3103

✖ anvi-cluster-contigs encountered an error after 0:02:27.789207

Config Error: One of the critical output files is missing ('clustering_gt1000.csv'). Please
              take a look at the log file: /scratch/tmprhjyf2u0/logs.txt    

And in the log file: /scratch/tmprhjyf2u0/logs.txt it shows:

DATE: 14 Aug 24 14:23:31
CMD LINE: concoct --coverage_file /scratch/tmprhjyf2u0/contig_coverages.txt --composition_file /scratch/tmprhjyf2u0/sequence_contigs.fa --basename /scratch/tmprhjyf2u0 --threads 4 --clusters 200
Traceback (most recent call last):
  File "/usr/local/bioinfo/bin/concoct", line 6, in <module>
    import vbgmm
ModuleNotFoundError: No module named 'vbgmm'
~

When I tried to look for vbgmm in our server, it exists somewhere else

locate vbgmm
/usr/local/bioinfo/Anaconda3-23/envs/binmate/lib/python3.6/site-packages/vbgmm.cpython-36m-x86_64-linux-gnu.so
/usr/local/bioinfo/Anaconda3-23/pkgs/concoct-1.1.0-py36h1eedd71_2/lib/python3.6/site-packages/vbgmm.cpython-36m-x86_64-linux-gnu.so
/usr/local/bioinfo/CONCOCT/build/lib.linux-x86_64-2.7/vbgmm.so
/usr/local/bioinfo/CONCOCT/build/temp.linux-x86_64-2.7/c-concoct/vbgmmmodule.o
/usr/local/bioinfo/CONCOCT/c-concoct/vbgmmmodule.c
/usr/local/bioinfo/CONCOCT/c-concoct/vbgmmmodule.h
/usr/local/bioinfo/CONCOCT/c-concoct/vbgmmmodule.o
/usr/local/bioinfo/binmate/envs/env_binmate_main_py36/lib/python3.6/site-packages/vbgmm.cpython-36m-x86_64-linux-gnu.so

however the vbgmm module were not in where anvio is look for it /usr/local/bioinfo/bin/concoct, is there anyway to change where anvio looking for vbgmm or using concoct? Or in what way do you suggest slove the problem?

Thanks in advance!

Best, Shiyi

meren commented 1 month ago

Dear @NebulaSL,

This message here tells me that the issue is about the concoct installation, and not related to anvi'o per se:

DATE: 14 Aug 24 14:23:31
CMD LINE: concoct --coverage_file /scratch/tmprhjyf2u0/contig_coverages.txt --composition_file /scratch/tmprhjyf2u0/sequence_contigs.fa --basename /scratch/tmprhjyf2u0 --threads 4 --clusters 200
Traceback (most recent call last):
  File "/usr/local/bioinfo/bin/concoct", line 6, in <module>
    import vbgmm
ModuleNotFoundError: No module named 'vbgmm'

It is not anvi'o that is looking for vbgmm, it is concoct. Can you please run the command concoct in your terminal and make sure it is working? When it is working for you in the terminal, it will work for anvi'o via anvi-cluster-contigs as well.

Best wishes,

semiller10 commented 1 month ago

Hi @meren and @NebulaSL -- I had trouble installing the version of concoct tailored to anvi'o (see installation instructions here), with problems arising like missing C library dependencies. Maybe you should first try installing concoct and running the workflow on your local machine before trying it on the server, following anvi'o instructions and referencing how I troubleshooted installation here.