merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
423 stars 144 forks source link

anvi-cluster-contigs / Memory issues? #1510

Open GeoMicroSoares opened 3 years ago

GeoMicroSoares commented 3 years ago

Thanks so much for your work in this tool and for this in advance - you guys are the best!

Short description of the problem

Running anvi-cluster-contigs with DASToolor Binsanityas drivers I get a python MemoryError.

anvi'o version

Replace this text with the output of this command:

$anvi-self-test --version
Anvi'o version ...............................: esther (v6.2) 
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

System info

Running WSL on Windows:

$ uname -a
Linux DESKTOP-2S2ORLO 4.4.0-18362-Microsoft #1049-Microsoft Thu Aug 14 12:01:00 PST 2020 x86_64 x86_64 x86_64 GNU/Linux

Anvi'o was installed via conda.

Detailed description of the issue

The error is the same, but to keep it simple and since it's the one that interests me the most I'll stick to DASTool (also happens with Binsanity). Running anvi-cluster-contigs as such:

anvi-cluster-contigs \
-p SAMPLES-MERGED/PROFILE.db \
-c $output"contigs.db" \
-C DASTOOL \
--driver dastool \
-S CONCOCT,METABAT2,MAXBIN2 \
-T 8 --just-do-it

The command line output from that is the following:

WARNING                                                                                                                                                                     ===============================================                                                                                                                             You are running an experimental workflow not every part of which may be fully
and thoroughly tested :) Please scrutinize your output carefully after analysis,                                                                                            and keep us posted if you see things that surprise you.                                                                                                                                                                                                                                                                                                 Contigs DB ...................................: sgg_mags/contigs.db
Profile DB ...................................: SAMPLES-MERGED/PROFILE.db
Binning module ...............................: DAS_Tool
Cluster type .................................: split
Working directory ............................: /tmp/tmpbo5snwpm

CITATION                                                                                                                                                                    ===============================================                                                                                                                             Anvi'o is now passing all your data to the binning module 'DAS_Tool'. If you                                                                                                publish results from this workflow, please do not forget to reference the                                                                                                   following citation.
                                                                                                                                                                                                                                                                                                                                     * Christian M. K. Sieber, Alexander J. Probst, Allison Sharrar, Brian C. Thomas,                                                                                            Matthias Hess, Susannah G. Tringe & Jillian F. Banfield (2018). Recovery of                                                                                                 genomes from metagenomes via a dereplication, aggregation and scoring strategy.                                                                                             Nature Microbiology. https://doi.org/10.1038/s41564-018-0171-1.
                                                                                                                                                                                                                                                                                         Report unbinned items if there are any .......: False
Items file path ..............................: /tmp/tmpbo5snwpm/METABAT2.txt
Bins info file path ..........................: /tmp/tmpbo5snwpm/METABAT2-info.txt
Report unbinned items if there are any .......: False
Items file path ..............................: /tmp/tmpbo5snwpm/CONCOCT.txt
Bins info file path ..........................: /tmp/tmpbo5snwpm/CONCOCT-info.txt

Report unbinned items if there are any .......: False
Items file path ..............................: /tmp/tmpbo5snwpm/MAXBIN2.txt
Bins info file path ..........................: /tmp/tmpbo5snwpm/MAXBIN2-info.txt

Config Error: One of the critical output files is missing ('OUTPUT_DASTool_scaffolds2bin.txt'). Please take a look at the log file:                                                                                                   /tmp/tmpbo5snwpm/logs.txt

And /tmp/tmpbo5snwpm/logs.txt looks like:

$ cat /tmp/tmpa3dnoeem/logs.txt
# DATE: 29 Sep 20 03:20:30
# CMD LINE: Binsanity -c /tmp/tmpa3dnoeem/contig_coverages_log_norm.txt -f /tmp/tmpa3dnoeem -l sequence_contigs.fa -o /tmp/tmpa3dnoeem
Traceback (most recent call last):
File "/home/andre/miniconda3/envs/anvio-6.2/bin/Binsanity", line 219, in <module> args.preference, args.inputContigFiles, args.outputdir, args.outname)
File "/home/andre/miniconda3/envs/anvio-6.2/bin/Binsanity", line 63, in affinity_propagation convergence),copy=True,preference=int(preference), affinity='euclidean', verbose=False).fit_predict(array)
File "/home/andre/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/sklearn/cluster/_affinity_propagation.py", line 474, in fit_predict return super().fit_predict(X, y)
File "/home/andre/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/sklearn/base.py", line 581, in fit_predictself.fit(X)
File "/home/andre/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/sklearn/cluster/_affinity_propagation.py", line 407, in fit self.affinity_matrix_ = -euclidean_distances(X, squared=True)
File "/home/andre/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/sklearn/utils/validation.py", line 73, in inner_f return f(**kwargs)
File "/home/andre/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/sklearn/metrics/pairwise.py", line 309, in euclidean_distances distances = - 2 * safe_sparse_dot(X, Y.T, dense_output=True)
File "/home/andre/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/sklearn/utils/validation.py", line 73, in inner_f return f(**kwargs)
File "/home/andre/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/sklearn/utils/extmath.py", line 153, in safe_sparse_dot ret = a @ b
MemoryError: Unable to allocate array with shape (281873, 281873) and data type float64

******************************************************                                                                                                                      **********************BinSanity***********************                                                                                                                      |____________________________________________________|
|                                                                               |
|               Computing Coverage Array                    |
|____________________________________________________|                                                                                                                                                                                                                                                                                                    Preference: -3
Maximum Iterations: 4000
Convergence Iterations: 400
Contig Cut-Off: 1000
Damping Factor: 0.95
Coverage File: /tmp/tmpa3dnoeem/contig_coverages_log_norm.txt
Fasta File: sequence_contigs.fa
Output directory: /tmp/tmpa3dnoeem
logfile: binsanity-logfile.txt
(281873, 9)
______________________________________________________
|                                                                                  |
|                        Clustering Contigs                            |
|______________________________________________________|

Files to reproduce

Let me know if this is something you can't reproduce - I can share files if needed.

PChuckran commented 3 years ago

I'm having a nearly identical issue. Did you ever find a solution?

edfadeev commented 2 years ago

Hi @GeoMicroSoares and @PChuckran , Did you manage to resolve this issue? I have the very same thing on Anvio 7 and cannot figure out where the problem is coming from.

Cheers!

PChuckran commented 2 years ago

Unfortunately no. I had the problem with Dastool. I ended up exporting my bins from maxbin, metabat, and concoct. I then ran dastool outside of anvi'o and imported those bins using anvi-import-collection.