matsengrp / gctree

GCtree: phylogenetic inference of genotype-collapsed trees
https://matsengrp.github.io/gctree
GNU General Public License v3.0
16 stars 2 forks source link

Fix underflow errors on some machines by avoiding calling np.exp(-np.inf) in ll. #70

Closed willdumm closed 2 years ago

willdumm commented 2 years ago

On some machines (e.g. Fred Hutch cluster, GitHub CI runners sometimes), unexpected numpy behavior causes CollapsedTree.ll to raise an underflow exception, causing inference to fail unexpectedly. This happens exactly when/because

import numpy as np
np.seterr(underflow='raise')
np.exp(-np.inf)

throws an underflow error, instead of returning 0, as expected. According to Fred Hutch SciComp, this just comes down to a difference in processors. For example, the newer nodes on the Fred Hutch cluster have this behavior and can't run gctree, while older ones can.

As currently written, CollapsedTree._ll_genotype requires that np.exp(-np.inf) == 0.

This PR fixes that, to avoid running afoul of numpy's weird moods.

Changes:

Tests:

This PR passes all tests, including test_ll_genotype_cache, which the current code would fail.

Would close #67

willdumm commented 2 years ago

But wait, there's more! in CollapsedForest.ll, when computing the marginal likelihood, there's a call to logsumexp. It turns out that there's (what I would consider) a bug in logsumexp, since they make the same assumption that np.exp(-np.inf) == 0, as we did. On the newer processors where that doesn't work:

import numpy as np
from scipy.special import logsumexp
np.seterr(under='raise')

a = np.array([1, 1])
b = np.array([0, 1])
logsumexp(a, b=b)  # underflow error
np.log(np.sum(b * np.exp(a)))  # 1
logsumexp(a[b != 0], b=b[b != 0])  # 1

Certainly there's no justification for raising an underflow error just because there's a 0 in the coefficient array b, but it's easy to see why it happens. In scipy logsumexp code:

def logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False):
        """Compute the log of the sum of exponentials of input elements.
             ... long docstring ...
        """
        a = _asarray_validated(a, check_finite=False)
        if b is not None:
            a, b = np.broadcast_arrays(a, b)
            if np.any(b == 0):
                a = a + 0.  # promote to at least float
                a[b == 0] = -np.inf   # (1) here we guarantee this will fail.

        a_max = np.amax(a, axis=axis, keepdims=True)

        if a_max.ndim > 0:
            a_max[~np.isfinite(a_max)] = 0
        elif not np.isfinite(a_max):
            a_max = 0

        if b is not None:
            b = np.asarray(b)
           tmp = b * np.exp(a - a_max)  # Underflow Error, since we put -inf in a at (1)

So, the last commit filters both arrays by b != 0 before passing them to logsumexp, as in the last line of the first code block

willdumm commented 2 years ago

Yes, I do. At the very least, they need to clearly document that the behavior isn't consistent. Also, it may be worth posting an issue for scipy.special.logsumexp, if the numpy thing can't/won't get fixed

willdumm commented 2 years ago

These changes were included in #68.