Maximum recursion depth exceeded when using EP + hierarchical modelling

VictorForouhar commented 2 years ago

I get the following error when I try to fit a hierarchical model using EP:

This is the same non-converging issue I alluded to in #495 when using uniform priors. Now that the Gaussian priors remain within the bounds, I confirmed that it also happens for them. This must be an issue with EP, since hierarchically fitting the same model and data without EP, e.g.:

search = af.DynestyStatic(
    name="no_EP_fit",
    sample="rwalk",
)

result = search.fit(model=factor_graph.global_prior_model, analysis=factor_graph)

converges to a solution.

rhayes777 commented 2 years ago

Any ideas @matthewghgriffiths ?

Jammy2211 commented 2 years ago

I am unable to reproduce the issue for the following two examples, which fit the 1D noisy Gaussian example with Gaussian priors with and without upper and lower limits:

https://github.com/Jammy2211/autofit_workspace_test/blob/main/graphical/ep/gaussian_priors.py

https://github.com/Jammy2211/autofit_workspace_test/blob/main/graphical/ep/gaussian_priors_bounded.py

The issue must be related to transforming priors into messages, but also dependent on the likelihood function having nasty behaviour?

VictorForouhar commented 2 years ago

I can share the examples that I am fitting, in case it helps pinpoint the issue.

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: James Nightingale @.> Sent: Thursday, March 31, 2022 2:03:33 PM To: rhayes777/PyAutoFit @.> Cc: FOROUHAR-MORENO, VICTOR J. @.>; Author @.> Subject: Re: [rhayes777/PyAutoFit] Maximum recursion depth exceeded when using EP + hierarchical modelling (Issue #500)

[EXTERNAL EMAIL]

I am unable to reproduce the issue for the following two examples, which fit the 1D noisy Gaussian example with Gaussian priors with and without upper and lower limits:

https://github.com/Jammy2211/autofit_workspace_test/blob/main/graphical/ep/gaussian_priors.py

https://github.com/Jammy2211/autofit_workspace_test/blob/main/graphical/ep/gaussian_priors_bounded.py

The issue must be related to transforming priors into messages, but also dependent on the likelihood function having nasty behaviour?

— Reply to this email directly, view it on GitHubhttps://github.com/rhayes777/PyAutoFit/issues/500#issuecomment-1084550100, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOMMHZYUX4E674I47AZXDSTVCWPCLANCNFSM5SFDF3YQ. You are receiving this because you authored the thread.Message ID: @.***>

Jammy2211 commented 2 years ago

I can share the examples that I am fitting, in case it helps pinpoint the issue.

Yes, I think we need to be able to reproduce the issue.

VictorForouhar commented 2 years ago

Okay, give me one sec

VictorForouhar commented 2 years ago

Attached is the data, the code for the models and log-likelihood functions I have defined and a script that runs the fitting python3 ./script.py

example.tar.gz

Jammy2211 commented 2 years ago

I created a bunch of EP tests which use different priors:

https://github.com/Jammy2211/autofit_workspace_test/tree/main/ep

The error crops up in tests where a UniformPrior or LogUniformPrior is used, and it occurs irrespective of whether we are fitting graphical models (e.g. shared parameters) or a hierarchical model.

The simplest example is this one:

https://github.com/Jammy2211/autofit_workspace_test/blob/main/ep/graph/uniform_priors.py

Before the recursion depth errors, the warning below crops up, indicating that the message transforms are giving NaN's:

2022-04-01 17:13:19,658 - autofit.graphical.mean_field - ERROR - nan parameter passed to UniformPrior
Traceback (most recent call last):
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 342, in update_factor_mean_field
    factor_dist = self / cavity_dist
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 258, in __truediv__
    {k: m / other.get(k, 1.0) for k, m in self.items()},
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 258, in <dictcomp>
    {k: m / other.get(k, 1.0) for k, m in self.items()},
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/messages/transform_wrapper.py", line 104, in __truediv__
    return self._new_for_base_message(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/prior/prior.py", line 59, in _new_for_base_message
    return type(self)(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/prior/prior.py", line 78, in __init__
    raise exc.MessageException(
autofit.exc.MessageException: nan parameter passed to UniformPrior

The __init__ contructor of the UniformPrior where the MessageExcpetion is raised indeed receives a nan in its input params:

class UniformPrior(WrappedInstance):
    """A prior with a uniform distribution between a lower and upper limit"""

    def __init__(
            self,
            lower_limit=0.0,
            upper_limit=1.0,
            id_=None,
            params=(0.0, 1.0)
    ):
        if any(map(np.isnan, params)):
            print(params)
            stop
            raise exc.MessageException(
                "nan parameter passed to UniformPrior"
            )

(4.651108532894578, nan)

I have printed the message in _new_for_base_message:

ShiftedUniformNormalMessage(mean=47.74294961, sigma=0.34072531, lower_limit=0., upper_limit=100., log_norm=0.00096244)
ShiftedUniformNormalMessage(mean=2.88576815, sigma=1.18661, lower_limit=0., upper_limit=100., log_norm=0.00096244)
ShiftedUniformNormalMessage(mean=11.08058978, sigma=1.039327, lower_limit=0., upper_limit=25., log_norm=0.00096244)
ShiftedUniformNormalMessage(mean=44.63348162, sigma=0.52604417, lower_limit=0., upper_limit=100., log_norm=0.)
ShiftedUniformNormalMessage(mean=99.99983492, sigma=nan, lower_limit=0., upper_limit=100., log_norm=0.)

A nan crops up in the sigma of one of the shifted uniform messages.

Ok, digging deeper I now realise this is probably not related to the recusion depth issue. It occurs specifically when the lower_limit=0.0 on a UniformPrior, as the __truediv__ method in graphical is given two values of zero during the first calculation of messages during EP.

Will dig into recusion dpeth now...

Jammy2211 commented 2 years ago

Okay, even if the lower_limit=0.01 or lower_limit=1.0 after the first round of iterations I end up with the divison by zero and NaN error.

I guess, irrespective of the lower limit, it still ends up going to a lower limit of 0.0 and dividing by zero after the first message transform.

Jammy2211 commented 2 years ago

Okay so the recursion is definitely related to the nan's, its just it only leads to the recursion error on certain occations (or after the initial wave of factor fits are complete, e.g. before EP properly begins):

Traceback (most recent call last):
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 343, in update_factor_mean_field
    factor_dist = self / cavity_dist
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 259, in __truediv__
    {k: m / other.get(k, 1.0) for k, m in self.items()},
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 259, in <dictcomp>
    {k: m / other.get(k, 1.0) for k, m in self.items()},
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/messages/transform_wrapper.py", line 104, in __truediv__
    return self._new_for_base_message(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/prior/prior.py", line 59, in _new_for_base_message
    return type(self)(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/prior/prior.py", line 78, in __init__
    raise exc.MessageException(
autofit.exc.MessageException: nan parameter passed to UniformPrior
Traceback (most recent call last):
  File "ep/graph/uniform_priors.py", line 119, in <module>
    factor_graph_result = factor_graph.optimise(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/declarative/abstract.py", line 189, in optimise
    updated_ep_mean_field = opt.run(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/expectation_propagation/optimiser.py", line 337, in run
    model_approx, status = self.factor_step(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/expectation_propagation/optimiser.py", line 278, in factor_step
    model_approx, status = optimiser.optimise(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 172, in optimise
    new_model_dist, status = self.optimise_approx(factor_approx, **kwargs)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 119, in optimise_approx
    state = self.prepare_state(factor_approx, mean_field, params)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 87, in prepare_state
    hessian = self.make_hessian(mean_field, free_variables, **self.hessian_kws)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 15, in make_posdef_hessian
    return MeanField.precision(mean_field, variables)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 204, in precision
    variances = MeanField.variance.fget(self).subset(variables)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 192, in variance
    return VariableData({v: dist.variance for v, dist in self.items()})
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 192, in <dictcomp>
    return VariableData({v: dist.variance for v, dist in self.items()})
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/messages/transform_wrapper.py", line 129, in __getattr__
    return getattr(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoConf/autoconf/tools/decorators.py", line 16, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/messages/transformed.py", line 126, in variance

Jammy2211 commented 2 years ago

Ok, so if I remove the cached_property decorator from the variance method I get this error:

Traceback (most recent call last):
  File "ep/graph/uniform_priors.py", line 119, in <module>
    factor_graph_result = factor_graph.optimise(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/declarative/abstract.py", line 189, in optimise
    updated_ep_mean_field = opt.run(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/expectation_propagation/optimiser.py", line 337, in run
    model_approx, status = self.factor_step(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/expectation_propagation/optimiser.py", line 278, in factor_step
    model_approx, status = optimiser.optimise(
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 174, in optimise
    new_model_dist, status = self.optimise_approx(factor_approx, **kwargs)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 121, in optimise_approx
    state = self.prepare_state(factor_approx, mean_field, params)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 89, in prepare_state
    hessian = self.make_hessian(mean_field, free_variables, **self.hessian_kws)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 17, in make_posdef_hessian
    return MeanField.precision(mean_field, variables)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 205, in precision
    return VariableFullOperator.from_diagonal(variances ** -1)
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/variable.py", line 292, in __op__
    return cls({k: op(val, other) for k, val in self.items()})
  File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/variable.py", line 292, in <dictcomp>
    return cls({k: op(val, other) for k, val in self.items()})
TypeError: unsupported operand type(s) for ** or pow(): 'method' and 'int'

Jammy2211 commented 2 years ago

For Victor's script, the problem currently is that only Gaussian priors work for message passing.

The script runs without error if we make the following replacements:

    # model.r_2   = af.LogUniformPrior(1e-2,0.5)
    # model.rho_2 = af.LogUniformPrior(lower_limit=10, upper_limit=200)

    model.r_2   = af.GaussianPrior(mean=0.25, sigma=0.5, lower_limit=1e-2, upper_limit=0.5)
    model.rho_2 = af.GaussianPrior(mean=100.0, sigma=100.0, lower_limit=10, upper_limit=200)

hierarchical_factor = af.HierarchicalFactor(
    af.GaussianPrior,
  #  mean =af.UniformPrior(lower_limit=1e-2,upper_limit=0.5),
  #  sigma=af.UniformPrior(lower_limit=0.1,upper_limit=0.5),
    mean=af.GaussianPrior(mean=0.25, sigma=0.1, lower_limit=1e-2, upper_limit=0.5),
    sigma=af.GaussianPrior(mean=0.25, sigma=0.25, lower_limit=0.1, upper_limit=0.5),
)

When Rich work's next week we'll need to investigate the issues transforming Uniform and LogUniform priors.

I have no idea if using GaussianPriors is a sane idea for your problem or not -- I guess you can combine to a non EP fit to check if the inferred parameters and errors are comparable.

VictorForouhar commented 2 years ago

For Victor's script, the problem currently is that only Gaussian priors work for message passing.

The script runs without error if we make the following replacements:
    # model.r_2   = af.LogUniformPrior(1e-2,0.5)
    # model.rho_2 = af.LogUniformPrior(lower_limit=10, upper_limit=200)

    model.r_2   = af.GaussianPrior(mean=0.25, sigma=0.5, lower_limit=1e-2, upper_limit=0.5)
    model.rho_2 = af.GaussianPrior(mean=100.0, sigma=100.0, lower_limit=10, upper_limit=200)
hierarchical_factor = af.HierarchicalFactor(
    af.GaussianPrior,
  #  mean =af.UniformPrior(lower_limit=1e-2,upper_limit=0.5),
  #  sigma=af.UniformPrior(lower_limit=0.1,upper_limit=0.5),
    mean=af.GaussianPrior(mean=0.25, sigma=0.1, lower_limit=1e-2, upper_limit=0.5),
    sigma=af.GaussianPrior(mean=0.25, sigma=0.25, lower_limit=0.1, upper_limit=0.5),
)
When Rich work's next week we'll need to investigate the issues transforming Uniform and LogUniform priors.

I have no idea if using GaussianPriors is a sane idea for your problem or not -- I guess you can combine to a non EP fit to check if the inferred parameters and errors are comparable.

Thank you for looking into this! I think those parameters should work well with Gaussian priors, but I will check if posteriors are similar, as you suggested.

rhayes777 commented 2 years ago

Division of one prior by another is really operating on the underlying distribution. So the uniform prior is actually a transformed transformed normal message: it's this message that arithmetic operates on. Sometimes the result of that operation is not well defined resulting in the nans. I'm not sure if we should have a different behaviour in this case @matthewghgriffiths ?

rhayes777 commented 2 years ago

I've pushed a fix to master that resolves the recursion issue.

However, I am now seeing sigma going to -inf during the Laplace optimisation.

rhayes777 / PyAutoFit

Maximum recursion depth exceeded when using EP + hierarchical modelling #500