pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.72k stars 2.02k forks source link

ADVI not properly sampling from Bounded distributions #1170

Closed aasensio closed 8 years ago

aasensio commented 8 years ago

I'm trying to sample from a complex hierarchical model with lots of variables. I'm thinking of applying ADVI to sample from the posterior because the model has potentially many variables. One of the variables has a bounded prior. In the process of learning how to write user-defined bounded distributions, I did some tests with simple distributions and, unless I'm doing something wrong, it looks that ADVI is not able to properly sample from bounded distributions. Here is the code I'm using:

with pm.Model() as model:
    boundedNormal = pm.Bound(pm.Normal, upper=1.0)
    th = boundedNormal('th', mu=0.0, sd=1.0)
    v_params = pm.variational.advi(n=50000)
    traceADVI = pm.variational.sample_vp(v_params, draws=5000)

with pm.Model() as model:
    boundedNormal = pm.Bound(pm.Normal, upper=1.0)
    th = boundedNormal('th', mu=0.0, sd=1.0)
    traceNUTS = pm.sample(2000, pm.NUTS())

pm.traceplot(traceADVI)
pm.traceplot(traceNUTS)

The samples from the NUTS alg are shown in the following figure_2

while those of ADVI are shown in the following: figure_1-3

twiecki commented 8 years ago

ADVI requires your model to live in an unconstrained space. We usually accomplish this via auto-transformations. It seems that Bound() does not add a transformed RV to the model (which maybe we should add as it's a bit confusing like this).

However, if you transformed this RV manually it should probably work. Just need to figure out how to transform :).

aasensio commented 8 years ago

Yes, I'll do the transformation myself. However, the question remains of why NUTS takes this auto-transformation into account while ADVI does not. I'll have a look at the code and try to understand why this happens

fonnesbeck commented 8 years ago

As with MCMC, we should not be reporting auto-transformed variables to users at all, and ADVI needs to report back-transformed variables (and all named deterministics, for that matter). Hopefully #1168 will help the first part of this).

taku-y commented 8 years ago

I have implemented sample_vp(). To fix this function, I need to know how Bound() works internally, specifically, the way for keeping RVs with Bound() and for construction of the theano graph. In the current implementation, sample_vp() can detect automatic transformed RVs (e.g., Dirichlet). I think that the detection (and back-transformation) of RVs transformed with Bound() requires a different way. I'm busy this weekend, so I will work on this problem next week.

twiecki commented 8 years ago

@taku-y if Bound() were auto-transforming this would be no problem though, right?

twiecki commented 8 years ago

@fonnesbeck The problem is that the model really only cares about the transformed variables. without a mechanism where we supply untransformed values which then get transformed to the real line I think we would just keep plastering over things.

fonnesbeck commented 8 years ago

I realize that the model needs to see different things than the user. Its more a matter of what is presented to the user, versus what gets used by the algorithm. We should be able to have these two interfaces to the same object.

taku-y commented 8 years ago

I thought Bound() was a transformation, but actually it works as a discontinuous operation such that logp returns -inf out of the specified range of the value. So it should not be used with ADVI. I tested the original code of @aasensio and ELBO values were -inf during optimization.

@aasensio I think pymc3.distributions.transforms.interval(lower, upper) solves the original problem. This transforms real coordinate space into a bounded space.

@twiecki Bound() is a discontinuous operation. If possible, ADVI should check such operations before execution. But I can't think of how to implement this.

aasensio commented 8 years ago

@taku-y @twiecki Would it be enough then if Bound() just adds the appropriate transformation to the distribution class? I'm doing some tests and having problems with the testval (probably because I still don't really understand the code). My feeling is that it would be more efficient both for ADVI and from the sampler points of view to transform the variables, instead of discarding samples when the values are outside the boundaries. This would heavily increase the efficiency, I guess.

taku-y commented 8 years ago

@aasensio I made an example notebook (https://gist.github.com/taku-y/aa4503452b5573c458c8c3fc32af0919). Does this result make sense?

In the notebook, MyBound class is defined, which supports transformation. I think this class should be added to the library.

As you said, the sampler would work more efficiently with transformations defined in MyBound, i.e., without rejection in the Metropolis-Hastings procedure, though I did not test that.

aasensio commented 8 years ago

@taku-y It looks like this is the way to go. I did exactly as you did in your example notebook, but this is not always safe. In particular, it is unsafe if the testval of the distribution is not inside the ranges of the interval. For instance, it does not work if you bound the distribution to the interval [0,1]. Any idea how to solve it?

taku-y commented 8 years ago

@aasensio Do you think of initialization of the random variable? I think we can check the testval and replace it a value within the range. It seems to be easily solved, but I might misunderstand what you said.

aasensio commented 8 years ago

Your code works if a suitable testval is passed to the distribution. @twiecki @fonnesbeck I don't know if this testval should be tested to be inside the domain of the bounded distribution or you want to let it be defined by the user.

twiecki commented 8 years ago

@taku-y That looks great. Do you think that could replace the current Bound implementation? Want to do a PR?

@aasensio I think it makes sense to test that the testval is inside the support.

aasensio commented 8 years ago

I agree that the Bound's implementation of @taku-y could replace the current one. However, I would add a couple more transformations to deal with cases in which only the upper or only the lower bounds are provided. I would suggest something in the line of

class LowerBound(ElemwiseTransform):
    """Transform from real line interval [a,inf] to whole real line."""

    name = "lowerbound"

    def __init__(self, a):
        self.a = a

    def backward(self, x):
        a = self.a
        r = tt.exp(x) + a
        return r

    def forward(self, x):
        a = self.a
        r = tt.log(x - a)
        return r

lowerbound = LowerBound

class UpperBound(ElemwiseTransform):
    """Transform from real line interval [-inf,b] to whole real line."""

    name = "upperbound"

    def __init__(self, b):
        self.b = b

    def backward(self, x):
        b = self.b
        r = b - tt.exp(x)
        return r

    def forward(self, x):
        b = self.b
        r = tt.log(b - x)
        return r

upperbound = UpperBound

and then

class Bounded(Continuous):
    """A bounded distribution."""
    def __init__(self, distribution, lower, upper, *args, **kwargs):
        self.dist = distribution.dist(*args, **kwargs)

        self.__dict__.update(self.dist.__dict__)
        self.__dict__.update(locals())

        if hasattr(self.dist, 'mode'):
            self.mode = self.dist.mode

        if not np.isinf(lower) and not np.isinf(upper):
            self.transform = transforms.interval(lower, upper)

        if not np.isinf(lower) and np.isinf(upper):
            self.transform = transforms.lowerbound(lower)

        if np.isinf(lower) and not np.isinf(upper):
            self.transform = transforms.upperbound(upper)

    def _random(self, lower, upper, point=None, size=None):
        samples = np.zeros(size).flatten()
        i, n = 0, len(samples)
        while i < len(samples):
            sample = self.dist.random(point=point, size=n)
            select = sample[np.logical_and(sample > lower, sample <= upper)]
            samples[i:(i+len(select))] = select[:]
            i += len(select)
            n -= len(select)
        if size is not None:
            return np.reshape(samples, size)
        else:
            return samples

    def random(self, point=None, size=None, repeat=None):
        lower, upper = draw_values([self.lower, self.upper], point=point)
        return generate_samples(self._random, lower, upper, point,
                                dist_shape=self.shape,
                                size=size)

    def logp(self, value):
        return bound(self.dist.logp(value),
                     value >= self.lower, value <= self.upper)

Do you think it is a good solution? I've done some tests and it looks like it's working, provided you pass a testval that is inside the boundaries.

twiecki commented 8 years ago

Yeah, that looks correct. I wonder if there is a way to integrate this better with the existing transformations in transforms.py (e.g. https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/transforms.py#L72) to avoid duplication.

aasensio commented 8 years ago

It's true that Log is a particular case of LowerBound. Maybe it's enough to rename my LowerBoundwith Logand check inside whether the lower bound is zero or not. My impression is that LowerBoundand UpperBoundare somehow more self-explanatory, but it requires changing the transformations in several distributions. If I find some time during these days, I can do a PR with these changes.

twiecki commented 8 years ago

@aasensio I agree. A PR would great. It's good enough if you consider the trade-offs and do the implementation you think is best.

taku-y commented 8 years ago

@aasensio Your code looks good to me. PR would be nice.

twiecki commented 8 years ago

Closed via https://github.com/pymc-devs/pymc3/pull/1205.