Closed willdumm closed 2 years ago
But wait, there's more! in CollapsedForest.ll
, when computing the marginal likelihood, there's a call to logsumexp. It turns out that there's (what I would consider) a bug in logsumexp, since they make the same assumption that np.exp(-np.inf) == 0
, as we did. On the newer processors where that doesn't work:
import numpy as np
from scipy.special import logsumexp
np.seterr(under='raise')
a = np.array([1, 1])
b = np.array([0, 1])
logsumexp(a, b=b) # underflow error
np.log(np.sum(b * np.exp(a))) # 1
logsumexp(a[b != 0], b=b[b != 0]) # 1
Certainly there's no justification for raising an underflow error just because there's a 0 in the coefficient array b, but it's easy to see why it happens. In scipy logsumexp code:
def logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False):
"""Compute the log of the sum of exponentials of input elements.
... long docstring ...
"""
a = _asarray_validated(a, check_finite=False)
if b is not None:
a, b = np.broadcast_arrays(a, b)
if np.any(b == 0):
a = a + 0. # promote to at least float
a[b == 0] = -np.inf # (1) here we guarantee this will fail.
a_max = np.amax(a, axis=axis, keepdims=True)
if a_max.ndim > 0:
a_max[~np.isfinite(a_max)] = 0
elif not np.isfinite(a_max):
a_max = 0
if b is not None:
b = np.asarray(b)
tmp = b * np.exp(a - a_max) # Underflow Error, since we put -inf in a at (1)
So, the last commit filters both arrays by b != 0
before passing them to logsumexp, as in the last line of the first code block
Yes, I do. At the very least, they need to clearly document that the behavior isn't consistent. Also, it may be worth posting an issue for scipy.special.logsumexp, if the numpy thing can't/won't get fixed
These changes were included in #68.
On some machines (e.g. Fred Hutch cluster, GitHub CI runners sometimes), unexpected numpy behavior causes CollapsedTree.ll to raise an underflow exception, causing inference to fail unexpectedly. This happens exactly when/because
throws an underflow error, instead of returning 0, as expected. According to Fred Hutch SciComp, this just comes down to a difference in processors. For example, the newer nodes on the Fred Hutch cluster have this behavior and can't run gctree, while older ones can.
As currently written,
CollapsedTree._ll_genotype
requires thatnp.exp(-np.inf) == 0
.This PR fixes that, to avoid running afoul of numpy's weird moods.
Changes:
tests/test_ll.py
containingtest_ll_genotype_cache
, a test which just callsCollapsedTree._build_ll_genotype_cache
. This results in the error mentioned above, when run on the GitHub CI runner without the fixes in this PR. This file also contains the functionvalidate_ll_genotype
, which checks that the newCollapsedTree._ll_genotype
method agrees with the old one, for a range of inputs.CollapsedTree._ll_genotype
so that when passed impossible parameter combinations, it raises a ValueError. Changes were also made to avoid recursive calls with impossible parameter combinations. Now this method will always raise an exception instead of returning a log likelihood of -inf.CollapsedTree._build_ll_genotype_cache
to avoid calling_ll_genotype
with impossible parametersCollapsedTree.ll
to eliminate_ll_genotype
call with impossible parameters, when checking whether to add pseudocount to root node.Tests:
This PR passes all tests, including
test_ll_genotype_cache
, which the current code would fail.Would close #67