refresh-bio / kmer-db

Kmer-db is a fast and memory-efficient tool for large-scale k-mer analyses (indexing, querying, estimating evolutionary relationships, etc.).
GNU General Public License v3.0
81 stars 16 forks source link

distance calculation evokes std::bad_alloc problem #3

Closed rmostowy closed 5 years ago

rmostowy commented 5 years ago

Dear developers,

I've been using kmer-db recently and it's a great piece of software. I'm struggling to use it on my laptop, however, as I'm getting an error.

I'm trying to calculate a distance matrix based on a set of genome assemblies. To this end, I'm using three commands: kmer-db build -t 2 -k 21 -f 0.02 genome-paths.txt pathogens.db kmer-db all2all pathogens.db matrix.csv kmer-db distance matrix.csv

The first two run flawlessly, however with the third one I'm getting the following error:

Calculating distance measures Loading file with common k-mer countsmatrix.csv...OK Calculating distances...OK terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Abort trap: 6

I'm wondering what could be causing this – insufficient memory? This calculation is based on 2663 assemblies and my laptop has 16GB of RAM memory.

Many thanks, Rafał

agudys commented 5 years ago

Hello Rafał, I don't think the problem is due to insufficient memory. As you analyze 2663 assemblies, tens of megabytes should be sufficient for storing resulting matrices. The build step requires much more memory, and it run successfully so the bug lies elsewhere. Could you share with me matrix.csv file so I can reproduce the error?

Regards, Adam

rmostowy commented 5 years ago

Dear Adam,

Thanks for the reply. I'm attaching the compressed matrix.csv file (ZIPped) used before making this thread. However, I should emphasise that – having played around with different input data – this problem occurs for any matrix.csv file, including a one I generated for 8 assemblies. That would be in accordance with what you're saying that this is not a memory issue.

I should also say that I am using Mac 10.13.6 and have used g++-8 to compile kmer-db. I seem to remember (would have to double check) that when I complied it with g++-8 on Ubuntu I found the same problem, but when I complied it using g++-5 the code worked.

I'm attaching a console output of the compilation, it might be helpful to look at the warnings. matrix.csv.zip compilation-output.txt

Regards, Rafal

agudys commented 5 years ago

Dear Rafał,

I updated the sources - distance mode should work properly now (I additionaly fixed the bug that resulted in wrong values of distance metrices).

Regards, Adam

rmostowy commented 5 years ago

Thanks Adam, it works well now.