samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

GenotypeLikelihoods.calcNumLikelihoods() should not be recursive #1446

Closed droazen closed 4 years ago

droazen commented 4 years ago

GenotypeLikelihoods.calcNumLikelihoods(numAlleles, ploidy), which is the method used to calculate the length of the PL array given the number of alleles and the ploidy, is currently written using an inefficient recursive implementation:

    private static final int calcNumLikelihoods(int numAlleles, int ploidy) {
        if (numAlleles == 1) {
            return 1;
        } else if (ploidy == 1) {
            return numAlleles;
        } else {
            int acc = 0;

            for(int k = 0; k <= ploidy; ++k) {
                acc += calcNumLikelihoods(numAlleles - 1, ploidy - k);
            }

            return acc;
        }
    }

However, this method has an easy closed-form combinatorics solution using the formula for selection with replacement:

image

So (for example), for numAlleles = 5 and ploidy = 2, it is just (5 + 2 - 1) choose 2 = 6 choose 2 = 15

(Thanks to @fleharty and @kachulis for their help in deriving this!)

Since this method is called every time we create a PL array (which is very frequently!), it seems worth it to get rid of the recursion in favor of a one-line call out to a standard combinatorics function.

droazen commented 4 years ago

For @KevinCLydon as part of his work on https://github.com/broadinstitute/gatk/issues/6322