Closed aasensio closed 8 years ago
ADVI requires your model to live in an unconstrained space. We usually accomplish this via auto-transformations. It seems that Bound()
does not add a transformed RV to the model (which maybe we should add as it's a bit confusing like this).
However, if you transformed this RV manually it should probably work. Just need to figure out how to transform :).
Yes, I'll do the transformation myself. However, the question remains of why NUTS takes this auto-transformation into account while ADVI does not. I'll have a look at the code and try to understand why this happens
As with MCMC, we should not be reporting auto-transformed variables to users at all, and ADVI needs to report back-transformed variables (and all named deterministics, for that matter). Hopefully #1168 will help the first part of this).
I have implemented sample_vp(). To fix this function, I need to know how Bound() works internally, specifically, the way for keeping RVs with Bound() and for construction of the theano graph. In the current implementation, sample_vp() can detect automatic transformed RVs (e.g., Dirichlet). I think that the detection (and back-transformation) of RVs transformed with Bound() requires a different way. I'm busy this weekend, so I will work on this problem next week.
@taku-y if Bound()
were auto-transforming this would be no problem though, right?
@fonnesbeck The problem is that the model really only cares about the transformed variables. without a mechanism where we supply untransformed values which then get transformed to the real line I think we would just keep plastering over things.
I realize that the model needs to see different things than the user. Its more a matter of what is presented to the user, versus what gets used by the algorithm. We should be able to have these two interfaces to the same object.
I thought Bound() was a transformation, but actually it works as a discontinuous operation such that logp returns -inf out of the specified range of the value. So it should not be used with ADVI. I tested the original code of @aasensio and ELBO values were -inf during optimization.
@aasensio I think pymc3.distributions.transforms.interval(lower, upper) solves the original problem. This transforms real coordinate space into a bounded space.
@twiecki Bound() is a discontinuous operation. If possible, ADVI should check such operations before execution. But I can't think of how to implement this.
@taku-y @twiecki Would it be enough then if Bound()
just adds the appropriate transformation to the distribution class? I'm doing some tests and having problems with the testval (probably because I still don't really understand the code).
My feeling is that it would be more efficient both for ADVI and from the sampler points of view to transform the variables, instead of discarding samples when the values are outside the boundaries. This would heavily increase the efficiency, I guess.
@aasensio I made an example notebook (https://gist.github.com/taku-y/aa4503452b5573c458c8c3fc32af0919). Does this result make sense?
In the notebook, MyBound
class is defined, which supports transformation. I think this class should be added to the library.
As you said, the sampler would work more efficiently with transformations defined in MyBound
, i.e., without rejection in the Metropolis-Hastings procedure, though I did not test that.
@taku-y It looks like this is the way to go. I did exactly as you did in your example notebook, but this is not always safe. In particular, it is unsafe if the testval of the distribution is not inside the ranges of the interval. For instance, it does not work if you bound the distribution to the interval [0,1]. Any idea how to solve it?
@aasensio Do you think of initialization of the random variable? I think we can check the testval and replace it a value within the range. It seems to be easily solved, but I might misunderstand what you said.
Your code works if a suitable testval is passed to the distribution. @twiecki @fonnesbeck I don't know if this testval should be tested to be inside the domain of the bounded distribution or you want to let it be defined by the user.
@taku-y That looks great. Do you think that could replace the current Bound
implementation? Want to do a PR?
@aasensio I think it makes sense to test that the testval is inside the support.
I agree that the Bound
's implementation of @taku-y could replace the current one. However, I would add a couple more transformations to deal with cases in which only the upper or only the lower bounds are provided. I would suggest something in the line of
class LowerBound(ElemwiseTransform):
"""Transform from real line interval [a,inf] to whole real line."""
name = "lowerbound"
def __init__(self, a):
self.a = a
def backward(self, x):
a = self.a
r = tt.exp(x) + a
return r
def forward(self, x):
a = self.a
r = tt.log(x - a)
return r
lowerbound = LowerBound
class UpperBound(ElemwiseTransform):
"""Transform from real line interval [-inf,b] to whole real line."""
name = "upperbound"
def __init__(self, b):
self.b = b
def backward(self, x):
b = self.b
r = b - tt.exp(x)
return r
def forward(self, x):
b = self.b
r = tt.log(b - x)
return r
upperbound = UpperBound
and then
class Bounded(Continuous):
"""A bounded distribution."""
def __init__(self, distribution, lower, upper, *args, **kwargs):
self.dist = distribution.dist(*args, **kwargs)
self.__dict__.update(self.dist.__dict__)
self.__dict__.update(locals())
if hasattr(self.dist, 'mode'):
self.mode = self.dist.mode
if not np.isinf(lower) and not np.isinf(upper):
self.transform = transforms.interval(lower, upper)
if not np.isinf(lower) and np.isinf(upper):
self.transform = transforms.lowerbound(lower)
if np.isinf(lower) and not np.isinf(upper):
self.transform = transforms.upperbound(upper)
def _random(self, lower, upper, point=None, size=None):
samples = np.zeros(size).flatten()
i, n = 0, len(samples)
while i < len(samples):
sample = self.dist.random(point=point, size=n)
select = sample[np.logical_and(sample > lower, sample <= upper)]
samples[i:(i+len(select))] = select[:]
i += len(select)
n -= len(select)
if size is not None:
return np.reshape(samples, size)
else:
return samples
def random(self, point=None, size=None, repeat=None):
lower, upper = draw_values([self.lower, self.upper], point=point)
return generate_samples(self._random, lower, upper, point,
dist_shape=self.shape,
size=size)
def logp(self, value):
return bound(self.dist.logp(value),
value >= self.lower, value <= self.upper)
Do you think it is a good solution? I've done some tests and it looks like it's working, provided you pass a testval that is inside the boundaries.
Yeah, that looks correct. I wonder if there is a way to integrate this better with the existing transformations in transforms.py
(e.g. https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/transforms.py#L72) to avoid duplication.
It's true that Log
is a particular case of LowerBound
. Maybe it's enough to rename my LowerBound
with Log
and check inside whether the lower bound is zero or not. My impression is that LowerBound
and UpperBound
are somehow more self-explanatory, but it requires changing the transformations in several distributions. If I find some time during these days, I can do a PR with these changes.
@aasensio I agree. A PR would great. It's good enough if you consider the trade-offs and do the implementation you think is best.
@aasensio Your code looks good to me. PR would be nice.
Closed via https://github.com/pymc-devs/pymc3/pull/1205.
I'm trying to sample from a complex hierarchical model with lots of variables. I'm thinking of applying ADVI to sample from the posterior because the model has potentially many variables. One of the variables has a bounded prior. In the process of learning how to write user-defined bounded distributions, I did some tests with simple distributions and, unless I'm doing something wrong, it looks that ADVI is not able to properly sample from bounded distributions. Here is the code I'm using:
The samples from the NUTS alg are shown in the following
while those of ADVI are shown in the following: