Fix underflow errors on some machines by avoiding calling np.exp(-np.inf) in ll.

willdumm commented 2 years ago

On some machines (e.g. Fred Hutch cluster, GitHub CI runners sometimes), unexpected numpy behavior causes CollapsedTree.ll to raise an underflow exception, causing inference to fail unexpectedly. This happens exactly when/because

import numpy as np
np.seterr(underflow='raise')
np.exp(-np.inf)

throws an underflow error, instead of returning 0, as expected. According to Fred Hutch SciComp, this just comes down to a difference in processors. For example, the newer nodes on the Fred Hutch cluster have this behavior and can't run gctree, while older ones can.

As currently written, CollapsedTree._ll_genotype requires that np.exp(-np.inf) == 0.

This PR fixes that, to avoid running afoul of numpy's weird moods.

Changes:

added tests/test_ll.py containing test_ll_genotype_cache, a test which just calls CollapsedTree._build_ll_genotype_cache. This results in the error mentioned above, when run on the GitHub CI runner without the fixes in this PR. This file also contains the function validate_ll_genotype, which checks that the new CollapsedTree._ll_genotype method agrees with the old one, for a range of inputs.
changed CollapsedTree._ll_genotype so that when passed impossible parameter combinations, it raises a ValueError. Changes were also made to avoid recursive calls with impossible parameter combinations. Now this method will always raise an exception instead of returning a log likelihood of -inf.
changed CollapsedTree._build_ll_genotype_cache to avoid calling _ll_genotype with impossible parameters
changed CollapsedTree.ll to eliminate _ll_genotype call with impossible parameters, when checking whether to add pseudocount to root node.

Tests:

This PR passes all tests, including test_ll_genotype_cache, which the current code would fail.

Would close #67

willdumm commented 2 years ago

But wait, there's more! in CollapsedForest.ll, when computing the marginal likelihood, there's a call to logsumexp. It turns out that there's (what I would consider) a bug in logsumexp, since they make the same assumption that np.exp(-np.inf) == 0, as we did. On the newer processors where that doesn't work:

import numpy as np
from scipy.special import logsumexp
np.seterr(under='raise')

a = np.array([1, 1])
b = np.array([0, 1])
logsumexp(a, b=b)  # underflow error
np.log(np.sum(b * np.exp(a)))  # 1
logsumexp(a[b != 0], b=b[b != 0])  # 1

Certainly there's no justification for raising an underflow error just because there's a 0 in the coefficient array b, but it's easy to see why it happens. In scipy logsumexp code:

def logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False):
        """Compute the log of the sum of exponentials of input elements.
             ... long docstring ...
        """
        a = _asarray_validated(a, check_finite=False)
        if b is not None:
            a, b = np.broadcast_arrays(a, b)
            if np.any(b == 0):
                a = a + 0.  # promote to at least float
                a[b == 0] = -np.inf   # (1) here we guarantee this will fail.

        a_max = np.amax(a, axis=axis, keepdims=True)

        if a_max.ndim > 0:
            a_max[~np.isfinite(a_max)] = 0
        elif not np.isfinite(a_max):
            a_max = 0

        if b is not None:
            b = np.asarray(b)
           tmp = b * np.exp(a - a_max)  # Underflow Error, since we put -inf in a at (1)

So, the last commit filters both arrays by b != 0 before passing them to logsumexp, as in the last line of the first code block

willdumm commented 2 years ago

Yes, I do. At the very least, they need to clearly document that the behavior isn't consistent. Also, it may be worth posting an issue for scipy.special.logsumexp, if the numpy thing can't/won't get fixed

willdumm commented 2 years ago

These changes were included in #68.

matsengrp / gctree

Fix underflow errors on some machines by avoiding calling np.exp(-np.inf) in ll. #70

Changes:

Tests: