sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

LargeCount VerticesExtractor::getDefaultNumberOfBitsForBloomFilter() should not overflow #196

Closed sebhtml closed 10 years ago

sebhtml commented 10 years ago

Link: http://permalink.gmane.org/gmane.science.biology.ray-genome-assembler/619

The code:


LargeCount VerticesExtractor::getDefaultNumberOfBitsForBloomFilter(){

/*
 * This is the product of these values:
 *
 * * Number of sequences on the rank;
 * * K-mer length;
 * * Number of strands (2);
 * * Number of directions in one dimension (2);
 */

        int numberOfErrorsPerRead = 4;

        int erroneousKmersPerError = m_parameters->getWordSize() * 2 * 2;

        // the formula below is completely arbitrary.
        // for serious cases, you should do an initial run
        // and then you should modify -bloom-filter-bits

        // furthermore, this formula does not consider
        // the true kmers in the genome

        int numberOfLocalReads = m_myReads->size();

        int bits = numberOfErrorsPerRead * erroneousKmersPerError * numberOfLocalReads;

        return bits;
}

Solution: add a maximum number of bits (let's say something like 1 GiB).

sebhtml commented 10 years ago

885e3010ccdb587e84b3d43f7a5e598b8f187c6f