HierarchicalModelLogPDF

DavAug commented 4 years ago

Would it be of interest to have a HierarchicalModelLogPDF class in pints?

So something that produces a joint LogPDF from individual LogPDFs with a specific dependence structure:

p(y, psi, theta) = p(y|psi)p(psi|theta)p(theta)

At the moment it is easy enough to implement the individual conditional PDFs with the existing functionality in pints I just haven't seen how to easily create sums of LogPDFs with different parameter spaces and also such with this hidden state structure.

MichaelClerx commented 4 years ago

If you can do it, then Yes please!

I think @martinjrobins and @chonlei both felt they couldn't generalise their hierarchical schemes enough?

DavAug commented 4 years ago

Sum of PDFs with disjoint parameter sets should be easy enough. Somewhat analogous to 'pints.SumOfIndependentLogPDFs', but conditioning on disjoint parameter sets and and introducing some rule for the mapping, say the parameters have to in the order [psi, theta].

# Get dimensions
        i = iter(self._log_likelihoods)
        self._indiv_n_params = []
        for e in i:
            self._indiv_n_parameters.append(e.n_parameters())
        self._n_parameters = np.sum(self._indiv_n_parameters)
        self._indiv_n_parameters = np.cumsum([0] + self._indiv_n_parameters)  # add 0 in front to make call easier

    def __call__(self, x):
        total = 0
        for i, e in enumerate(self._log_likelihoods):
            total += e(x[self._indiv_n_parameters[i]:self._indiv_n_parameters[i+1]])
        return total

DavAug commented 4 years ago

But having varying psi messes up the intended functionality for Likelihoods a bit in p(psi|theta).

DavAug commented 4 years ago

So would it be acceptable to alter the values in the likelihood p(psi|theta), even if usually the "data" is not meant to change.

So a quick solution would be to access the private variable self._values in the pints.problem and update the psi values every iteration.

simonmarchant commented 4 years ago

I'm looking at doing a hierarchical prior function, to at least add this functionality for a small subset of cases. I've written a docstring that describes what I'm intending for it to do: does it sound reasonable for a first hierarchical prior function? In particular @ben18785 the hierarchical guru

ben18785 commented 4 years ago

Looks really good Simon -- thanks!

On Wed, Jun 3, 2020 at 2:49 PM Simon Marchant notifications@github.com wrote:

I'm looking at doing a hierarchical prior function, to at least add this functionality for a small subset of cases. I've written a docstring that describes what I'm intending for it to do: does it sound reasonable for a first hierarchical prior function? [image: image] https://user-images.githubusercontent.com/55392151/83644381-252d4000-a5a9-11ea-90c3-9e9a5283b837.png In particular @ben18785 https://github.com/ben18785 the hierarchical guru

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1134#issuecomment-638210000, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKHIM5AXINXJPTKNM4LRUZIFHANCNFSM4M6U3VHQ .

DavAug commented 4 years ago

@MichaelClerx @ben18785 @martinjrobins @chonlei

Additionally to the really cool HierarchicalLogPrior that @simonmarchant is implementing, I've been thinking about how one could compose hierarchical posteriors that are not constrained to Normally distributed bottom-level parameters, and more importantly where you have the chance to pool some of the bottom-level parameters. In my problems you often want some parameters like the noise to be shared between individuals, while others may vary. I couldn't really workout how pymc3 may be integrated with pints in order to do this.

So here is my suggestion (sorry for the unrendered teX, but maybe just look at the example at the bottom to get a feel for the interface):

class HierarchicalLogPosterior(LogPDF):
    r"""
    Represents the log-posterior of an hierarchical model

    .. math::
        \log \mathbb{P}(\theta ,\psi ^{(1)}, \ldots ,\psi ^{(n)} |
        D^{(1)}, \ldots ,D^{(n)}) =
        \sum _{i=1}^n \log \mathbb{P}(y | \psi ^{(i)})
        \Big | _{y = D^{(i)}} +
        \sum _{i=1}^n \mathbb{P}(\psi ^{(i)} | \theta) + \mathbb{P}(\theta ) +
        \text{constant},

    where :math:`y` is the observable output of the hierarchical model whose
    distribution is parametrised by :math:`\psi`. The top level parameters
    :math:`\theta` determine the distribution of :math:`\psi`. We refer to
    :math:`\sum _{i=1}^n\log\mathbb{P}(y|\psi ^{(i)})\Big |_{y=D^{(i)}}`
    as the bottom-level log-likelihood, and to
    :math:`\sum _{i=1}^n \mathbb{P}(\psi _i | \theta)` as the top-level
    log-likelihood.

    Hierarchical modelling of :math:`n` data sets :math:`D^{(1)}\ldots,D^{(n)}`
    is the compromise between a pooled analysis, in which one set of model
    parameters :math:`\psi` is assumed to be suitable for all observations, and
    an unpooled analysis where an independent set of parameters
    :math:`\psi ^{(i)}` is inferred for each data set :math:`D^{(i)}`. In
    hierarchical models the :math:`\psi ^{(i)}` are assumed to
    follow a top level distribution parameterised by parameters :math:`\theta`.

    The :class:`HierarchicalLogPosterior` takes a list of bottom-level
    log-likelihoods of length :math:`n`, a list of top-level likelihoods of
    length :math:`b`, and a list of priors of length :math:`t`. The
    bottom-level log-likelihoods represent the likelihoods for the
    parameters :math:`\psi ^{(i)}` for each data set :math:`D^{(i)}`, where
    each parameter set is of size :math:`b`. The top-level log-likelihoods
    represent the likelihoods of the parameters :math:`\theta` for a given set
    of bottom parameters :math:`\psi ^{(1)}, \ldots , \psi ^{(n)}`. For each of
    the :math:`t` top-level parameters :math:`\theta` is a prior passed to the
    hierarchical log-posterior class.

    The log-posterior can be evaluated by passing an array-like object with
    parameters (:math:`\theta, \psi ^{(1)}, \ldots , \psi ^{(n)}`) of length
    :math:`t + nb`. Parameters are sorted such that top-level parameters come
    before bottom-level parameters. The parameter order within the
    log-likelihoods is preserved, and parameters from different
    log-likelihoods are appended.

    Optionally, a subset of the parameters :math:`\psi` can be chosen to be
    pooled by passing a boolean array-like object ``non_hierarchical_params``
    of length :math:`b` to the HierarchicalLogPosterior. ``True`` indicates
    pooling. With :math:`p` pooled parameters the length of the parameters
    reduces to :math:`t + p + n(b-p)`. Pooled parameters count as top-level
    parameters, and therefore come before the remaing bottom-level parameters.

    Extends :class:`LogPDF`

    Parameters
    ----------
    bottom_log_likelihood : List of :class:`LogPDF` of length :math:`n`
        A list of log-likelihoods of the parameters :math:`\psi^{(i)}` for each
        data set :math:`D^{(1)}, \dots, D^{(n)}`. All log-likelihoods in the
        list live on the same :math:`b`-dimensional parameter space.
    top_log_likelihood : List of :class:`LogPrior` of length :math:`b`
        A list of log-likelihoods of the parameters :math:`\theta` for a given
        set of bottom-level parameters
        :math:`\psi ^{(1)}, \ldots , \psi ^{(n)}`. If :math:`p` of the
        :math:`b` bottom-level parameters are pooled, only :math:`b-p`
        top-level likelihoods can be passed.
    log_prior : List of :class:`LogPrior` of length :math:`t`
        A list of log-priors for the top-level parameters :math:`\theta`. If
        :math:`p` of the bottom-level parameters are pooled, :math:`t+p`
        log-priors need to be passed to the HierarchicalPosterior.
    non_hierarchical_params : Array-like boolean of length :math:`b`
        Mask which determines which of the :math:`b` parameters :math:`\psi`
        may be pooled.

    Example
    -------
    ::

        # Three schools problem: Assume data for the different schools has
        # already been integrated into three pints.SingleOutputProblem with
        # one parameter each for the mean score of the school.
        #
        # Bottom-level model is Normal distribution, where for illustration we
        # choose to pool the variance parameter. Top-level model is also a
        # Normal distribution.

        # Bottom-level log-likelihoods
        bottom_log_likelihood_school_one = pints.GaussianLogLikelihood(
            problem_school_one)
        bottom_log_likelihood_school_two = pints.GaussianLogLikelihood(
            problem_school_two)
        bottom_log_likelihood_school_three = pints.GaussianLogLikelihood(
            problem_school_three)

        # Top-level log-likelihoods
        top_log_likelihood = pints.GaussianLogPrior(
            mean=0, sd=1)  # Initial values have no effect

        # Log-priors
        bottom_variance_log_prior = pints.HalfCauchyLogPrior(
            location=0, scale=5)
        top_mean_log_prior = pints.NormalLogPrior(
            mean=0, standatd_deviation=5)
        top_variance_log_prior = pints.HalfCauchyLogPrior(
            location=0, scale=5)

        # Hierarchical log-posterior
        log_posterior = pints.HierarchicalLogPosterior(
            bottom_log_likelihood=[
                bottom_log_likelihood_school_one,
                bottom_log_likelihood_school_two,
                bottom_log_likelihood_school_three],
            top_log_likelihood=[
                top_log_likelihood]
            log_prior=[
                bottom_variance_log_prior,
                top_mean_log_prior,
                top_variance_log_prior]
            non_hierarchical_params=[
                False, True])  # Bottom-mean: hierarchical, -variance: pooled

        # Evaluation of hierarchical log-prior
        mock_top_mean = 2.5
        mock_top_variance = 4
        mock_bottom_variance = 1
        mock_mean_school_one = 1
        mock_mean_school_two = 2
        mock_mean_school_three = 3

        value = log_posterior([
            mock_top_mean,
            mock_top_variance,
            mock_bottom_variance,
            mock_mean_school_one,
            mock_mean_school_two,
            mock_mean_school_three])
    """

I'm grateful for any suggestions.

DavAug commented 4 years ago

I realised that for gradient based sampling or optimisation, we would need to be able to compute the partials of the top-level likelihoods with respect to both, it's inputs \theta but also with respect to \psi.

If we used LogPriors for the top-level likelihoods, the priors would need to get an additional method which would return the partials with respect to the hyperparameters.

pints-team / pints

HierarchicalModelLogPDF #1134