pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.64k stars 1.99k forks source link

Lognormal could be more general #864

Closed dan-tee closed 8 years ago

dan-tee commented 8 years ago

In http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.lognorm.html the LogNormal takes 3 parameters, loc, scale and s.

In pymc3 it only takes mu and tau, which are transformations of loc and s.

Would you like a pull request introducing the other parameter? What should it be called? scale or alpha?

fonnesbeck commented 8 years ago

I'll admit I'm not a huge fan of Scipy's parameterization conventions. Their Gamma distribution also has 3 parameters, and it makes using the distribution more confusing more often than it aids modeling. Is there a specific use case for a 3-parameter lognormal? Otherwise I would err on the side of simplicity. Others are welcome to chime in.

dan-tee commented 8 years ago

I have data that i can nicely fit with the SciPy version of the lognormal. But I just saw that i can't just add another parameter. The mu in PyMCs definition is also different from the loc parameter in SciPy.

bjedwards commented 8 years ago

The scipy lognormal is a little strange. The loc parameter in scipy transforms the data, then the scale parameter is exp(mu) and shape is standard deviation. I've never seen it defined this way in any place other than scipy. Whenever I fit data using scipy I do

st.lognorm.fit(data,floc=0)

so it matches the typical definition.

I would leave it as is, just on principle of least surprise.

See here for more info.

twiecki commented 8 years ago

@dan-tee What does that third parameter allow you to do?

dan-tee commented 8 years ago

@twiecki Actually both SciPy's loc and scale are missing for PyMC3's Lognormal. The allow you to fit to non-normalized data. But I can just shift and scale my data and then use PyMC3's Lognormal.

Just as context, here are plots of my data and fitted lognormal curves.

data

twiecki commented 8 years ago

I'm confused, PyMC3's lognormal has a mu and tau parameter. Are you saying that you'd rather specify the distribution in terms of mean and precision/variance? In other words, mu and tau are in log-space which might be confusing.

dan-tee commented 8 years ago

@twiecki: As far as i understand the parameters, tau in PyMC3's Lognormal is 1/(s**2) for s from SciPy's lognorm. The parameter mu in PyMC3 seems to be missing in SciPy. I could't derive it from loc and scale in SciPy.

Loc and scale from SciPy on the other hand seem to be missing in PyMC3. But they are not necessarily needed, because i could transform my data.

fonnesbeck commented 8 years ago

The parameterizations for PyMC 3 should be consistent with those from PyMC 2, which in turn were generally parameterized to be consistent with WinBUGS, which was (and probably still is) the de facto standard for Bayesian modeling, since thats where many of our users likely came from. Hence, we still use precision instead of variance for normal distributions, and our gamma distribution is parameterized differently from that of SciPy. The Wikipedia entry for lognormal also specifies mu the way we have.

I would be open to having alternative parameterizations via keyword parameters, like we do for tau and sd in the normal distribution, as long as we don't make things more confusing. It has never been a design choice to be consistent with SciPy's distributions, so I'm not as concerned with that.

twiecki commented 8 years ago

Yeah, I'm in favor in supporting alternative parameterizations. This works out quite well for sd but also the Beta distribution which can be specified by loc and scale (which we support). Makes for much more readable models.

bjedwards commented 8 years ago

@dan-tee, PyMC3 does have a mu parameter, and it is in the usual parameterization.

What scipy's lognormal allows you to do is fit a shifted lognormal distribution. That is one that has a has support from (c,+infinity). For scipy loc=c, scale=exp(mu), shape=sd=1/sqrt(tau).

Here is a notebook demonstrating the differences. It might be possible to write a shifted lognormal distribution for PyMC3, I took a quick crack at it, at the end, it seems to sample ok, but I got some weird theano warnings when I sampled.

Also why does find_MAP suddenly only return transformed variables and not the actual variables? I just noticed that.

twiecki commented 8 years ago

@bjedwards Interesting, thanks for putting that together. The shift parameter does make sense and I wonder if we shouldn't just include it but default to it being 0 so that things are still identical. We could then also add the reparameterizations.

The warning is odd but reminiscent of the recent theano ufunc problem, but I suppose you're up-to-date on theano master?

I'm also surprised that find_MAP only returns the transformed variables, that doesn't seem very practical and we should fix it.

bjedwards commented 8 years ago

@twiecki, I would be somewhat reluctant to change Lognormal to scipy's convention, simply because scipy is the only place I've seen a lognormal parameterized that way. I could be wrong about this, maybe scipy's way is more standard than I think. I think part of the question is whether you want one distribution with default parameters that make it simple, e.g. shifted lognormal with c=0 is lognormal or just for another example skewnormal with alpha=0 is normal, or different distributions for each. As a side note I have a Skewnormal and LogSkewnormal distributions that I could probably throw into a pull request if people are interested.

I'd advocate for including a separate distribution with pointers in the documentation indicating that it matches scipy's and providing the conversion from the scipy parameterization. The lognormal docs could point to the shiftedlognormal docs.

I wouldn't know how to generate random numbers though, maybe just from a lognormal then add c?

I am not on the latest Theano, I'll update and try again. EDIT: Updating to the latest theano seems to have eliminated the errors.

Sample provides a trace with both the variable and it's transformed version, so I don't know where find_MAP missed out, I'll maybe try to dig around.

dan-tee commented 8 years ago

@bjedwards thanks for the detailed answer. So mu in PyMC3 is actually the log of the scale parameter in SciPy. I missed that.

The extra shift parameter would be helpful for me. It's a simplicity vs. convenience trade-off.

twiecki commented 8 years ago

Couldn't you just add the shift externally?

dan-tee commented 8 years ago

@twiecki I definitely could. And it seems like several other distributions in PyMC3 don't have a shift parameter.

twiecki commented 8 years ago

That's probably the most general approach.

pm.Uniform('shift') + pm.LogNormal('unshifted_log_normal').

ghost commented 8 years ago

Hi guys, look I'm sorry but I'm a little confused. .... If I want to parameterise LogNormal so that the underlying distribution is N(1,1000000), then I would use something like.?... ec50 = Lognormal('ec50', mu=1, tau = 6)

Thanks in advance

Mark

kyleabeauchamp commented 8 years ago

I also just noticed that find MAP only returns the auto-transformed variables--it's pretty confusing. It even happens in the tutorial.

kyleabeauchamp commented 8 years ago

E.g. as a user I don't want to have to figure out what internal transformations were made and then invert them manually.

fonnesbeck commented 8 years ago

@kyleabeauchamp we have hidden the transformed variables in the trace for MCMC samplers, but its clear that this is not enough. I agree that the transformations should be kept entirely away from users, and have articulated this here #1116 Feel free to add to the discussion.

twiecki commented 8 years ago

@kyleabeauchamp You're correct. Ideally this complexity would be hidden from the user. The problem is that the model really operates in the transformed space so those are the values it expects / requires. Not quite sure how that can be fixed.

fonnesbeck commented 8 years ago

I'm going to close this. Please submit a PR for discussion if this is still of interest.

cs224 commented 7 years ago

Hello,

I am new to both, Python and MCMC techniques and am working my way into PyMC3. As an exercise to familiarize myself with PyMC3 I would like to fit a mixture model of two shifted gamma distributions to generated data. As a next step I would then like to extend this with a stick-breaking process to an "arbitrary" number of shifted gammas, but one step at a time.

The complete code can be found in the following gist: https://gist.github.com/cs224/36482b4f52885310cec6ef975fd20184

Here is the model that I try to implement:

with pm.Model() as model:
    p = pm.Beta( "p", 1., 1., testval=0.5) #this is the fraction that come from mean1 vs mean2

    alpha = pm.Uniform('alpha', 0, 10) # == k
    beta  = pm.Uniform('beta' , 0, 1) # == theta
    shifts = pm.Normal('shifts', mu=20, sd=50, shape=2)

    gamma = ShiftedGamma.dist(alpha, beta, c=shifts)
    process = pm.Mixture('obs', tt.stack([p, 1-p]), gamma, observed=data)

with model:
    step = pm.Metropolis()
    trace = pm.sample(10000, step=step)

And here is the implementation of the ShiftedGamma that I came up with:

class ShiftedLog(pm.distributions.continuous.transforms.ElemwiseTransform):
    name = "log"
    def __init__(self, c=0.0):
        self.c = c

    def backward(self, x):
        return tt.exp(x + self.c)

    def forward(self, x):
        return tt.log(x - self.c)

class ShiftedGamma(pm.distributions.continuous.Continuous):
    def __init__(self, alpha=None, beta=None, mu=None, sd=None, c=0.0, *args, **kwargs):
        super(ShiftedGamma, self).__init__(transform=ShiftedLog(c), *args, **kwargs)
        alpha, beta = self.get_alpha_beta(alpha, beta, mu, sd)
        self.alpha = alpha
        self.beta = beta
        self.mean = alpha / beta + c
        self.mode = tt.maximum((alpha - 1) / beta, 0) + c
        self.variance = alpha / beta**2
        self.c = c
        # self.testval = kwargs.pop('testval', None)
        # if not self.testval:
        #     self.testval = c + 10.0

        pm.distributions.continuous.assert_negative_support(alpha, 'alpha', 'Gamma')
        pm.distributions.continuous.assert_negative_support(beta, 'beta', 'Gamma')

    def get_alpha_beta(self, alpha=None, beta=None, mu=None, sd=None):
        if (alpha is not None) and (beta is not None):
            pass
        elif (mu is not None) and (sd is not None):
            alpha = mu**2 / sd**2
            beta = mu / sd**2
        else:
            raise ValueError('Incompatible parameterization. Either use '
                             'alpha and beta, or mu and sd to specify '
                             'distribution.')

        return alpha, beta

    def random(self, point=None, size=None, repeat=None):
        alpha, beta, c = draw_values([self.alpha, self.beta, self.c], point=point)
        return generate_samples(stats.gamma.rvs, alpha, scale=1. / beta, dist_shape=self.shape, size=size, loc=c)

    def logp(self, value):
        alpha = self.alpha
        beta = self.beta
        c = self.c
        x = value - c
        return bound(-gammaln(alpha) + logpow(beta, alpha) - beta * x + logpow(x, alpha - 1), x >= 0, alpha > 0, beta > 0)

I got the "inspiration" for this implementation from this gist here: https://gist.github.com/bjedwards/b9f052966705de613637

I have basically 2 questions:

1) Is my implementation "good"/"correct"? I am especially uncertain about the transform parameter, where I implemented a ShiftedLog transform. And is the random() method used anywhere at all? If I set a breakpoint here it does never seem to get called.

2) Why is this model so fragile against basically any changes I make to either the priors or the sampling method? Just see the many comments in the gist to see my experimenting back-and-forth until I get results from PyMC3.

Point 2) might be a consequence of having made a mistake in point 1) or it might be any of the magic reasons as explained by Michael Betancourt in his talk at StanCon: https://www.youtube.com/watch?v=DJ0c7Bm5Djk&feature=youtu.be&t=4h40m9s.

Around my question 1) I would also like to understand if there are better or easier ways to achieve the same effect of fitting a model with shifted gammas to data, e.g. is it really necessary to implement the ShiftedGamma or could I achieve the same effect via pm.Deterministic() or similar?

The main reason why I comment this question in this issue is that if my approach of using a ShiftedGamma as a new class is the "correct" one then I would answer the question of @fonnesbeck, that yes, this issue is still of interest. If there are alternative and better ways to achieve my goal of fitting such a model of shifted gammas then I'm happy to switch to a different approach.

In case this question is wrong here I will also post it on StackOverflow.

Thanks a lot for your help and insight! Christian

P.S.:

If you follow the link from above to the gist you can see that I just added a PyStan version of the model.

If I use NUTS and help this Stan model with the correct init values it works. Otherwise I get "wrong answers".

If I use HMC then I get the error message: "Exception thrown at line 16: beta_log: Random variable is nan, but must not be nan! ... if this warning occurs often then your model may be either severely ill-conditioned or misspecified." How would I do this model of two mixed shifted gamma distributions "right"? Either in PyMC3 or in Stan?

aseyboldt commented 7 years ago

I've never really worked with mixture models, so take this with a grain of salt.

You seem to have a typo in the transformation: shouldn't backward be tt.exp(x) + self.c? In this case it would be the same as pm.distributions.transforms.lowerbound. Otherwise your approach seems fine. I don't know of any way to do this with deteministics, as the mixture requires a distribution. In pretty much all other circumstances a ShiftedGamma should not be necessary though.

I think you could simplify the other code to something like this: (please double check, I did not test this properly...)

class ShiftedGamma(pm.Gamma):
    def __init__(self, alpha, beta, shift, *args, **kwargs):
        transform = pm.distributions.transforms.lowerbound(shift)
        super().__init__(alpha=alpha, beta=beta, *args, **kwargs,
                         transform=transform)
        self.shift = shift
        self.mean += shift
        self.mode += shift

    def random(self):
        return super().random() + self.shift

    def logp(self, x):
        return super().logp(x - self.shift)

with pm.Model() as model:
    p = pm.Beta( "p", 1., 1., testval=0.5)

    alpha = pm.Uniform('alpha', 0, 10) # == k
    beta  = pm.Uniform('beta' , 0, 1) # == theta
    shifts = pm.Normal('shifts', mu=20, sd=50, shape=2, testval=floatX([0., 0.]))

    gamma = ShiftedGamma.dist(alpha=alpha, beta=beta, shift=shifts)

    pm.Mixture('y', tt.stack([p, 1-p]), gamma, observed=data))

About the difficulties with initialization and sampling: You are using the MAP, which is discouraged most of the time, using advi for initialization is usually better. I think the reason you have such a hard time is that many values of shift lead to a logp of -inf, namely if the minimum of the shifts is larger than the minimum of the dataset. You could try to reparameterize somehow, to make sure this can't happen. Maybe something like this:

shift1_raw = pm.HalfCauchy('shift1_raw', beta=1)
shift2_raw = pm.HalfCauchy('shift2_raw', beta=1)
shift1 = data.min() - shift1_raw
shift2 = shift1 + shift2_raw
shift = tt.stack([shift1, shift2])
shift = pm.Deterministic('shift', shift)

But this definitely requires more thought; this version changes the prior, and this new prior depends on the dataset, and shift1_raw and shift2_raw will probably be highly correlated (which makes the sampler unhappy).

Also, keep in mind that mixture models typically end up being multimodal, and NUTS does not usually work well for those.

aseyboldt commented 7 years ago

Oh, and I guess I should have posted that to stackoverflow; I think that is a better place for stuff like this than some random github issue :-)

cs224 commented 7 years ago

@aseyboldt: Thanks a lot for your answer and hints! I'll try and see how far this will get me :) I'll let you know.

I've posted the question also to stackoverflow here: http://stackoverflow.com/questions/42735489/fit-a-mixture-model-of-two-shifted-gamma-distributions-in-pymc3 But I see that you already found the stackoverflow issue. My problem with stackoverflow is/was that it only allows me to use 2 external links.