Closed VictorForouhar closed 1 year ago
Any ideas @matthewghgriffiths ?
I am unable to reproduce the issue for the following two examples, which fit the 1D noisy Gaussian example with Gaussian priors with and without upper and lower limits:
https://github.com/Jammy2211/autofit_workspace_test/blob/main/graphical/ep/gaussian_priors.py
The issue must be related to transforming priors into messages, but also dependent on the likelihood function having nasty behaviour?
I can share the examples that I am fitting, in case it helps pinpoint the issue.
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: James Nightingale @.> Sent: Thursday, March 31, 2022 2:03:33 PM To: rhayes777/PyAutoFit @.> Cc: FOROUHAR-MORENO, VICTOR J. @.>; Author @.> Subject: Re: [rhayes777/PyAutoFit] Maximum recursion depth exceeded when using EP + hierarchical modelling (Issue #500)
[EXTERNAL EMAIL]
I am unable to reproduce the issue for the following two examples, which fit the 1D noisy Gaussian example with Gaussian priors with and without upper and lower limits:
https://github.com/Jammy2211/autofit_workspace_test/blob/main/graphical/ep/gaussian_priors.py
The issue must be related to transforming priors into messages, but also dependent on the likelihood function having nasty behaviour?
— Reply to this email directly, view it on GitHubhttps://github.com/rhayes777/PyAutoFit/issues/500#issuecomment-1084550100, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOMMHZYUX4E674I47AZXDSTVCWPCLANCNFSM5SFDF3YQ. You are receiving this because you authored the thread.Message ID: @.***>
I can share the examples that I am fitting, in case it helps pinpoint the issue.
Yes, I think we need to be able to reproduce the issue.
Okay, give me one sec
Attached is the data, the code for the models and log-likelihood functions I have defined and a script that runs the fitting python3 ./script.py
I created a bunch of EP tests which use different priors:
https://github.com/Jammy2211/autofit_workspace_test/tree/main/ep
The error crops up in tests where a UniformPrior
or LogUniformPrior
is used, and it occurs irrespective of whether we are fitting graphical models (e.g. shared parameters) or a hierarchical model.
The simplest example is this one:
https://github.com/Jammy2211/autofit_workspace_test/blob/main/ep/graph/uniform_priors.py
Before the recursion depth errors, the warning below crops up, indicating that the message transforms are giving NaN's:
2022-04-01 17:13:19,658 - autofit.graphical.mean_field - ERROR - nan parameter passed to UniformPrior
Traceback (most recent call last):
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 342, in update_factor_mean_field
factor_dist = self / cavity_dist
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 258, in __truediv__
{k: m / other.get(k, 1.0) for k, m in self.items()},
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 258, in <dictcomp>
{k: m / other.get(k, 1.0) for k, m in self.items()},
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/messages/transform_wrapper.py", line 104, in __truediv__
return self._new_for_base_message(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/prior/prior.py", line 59, in _new_for_base_message
return type(self)(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/prior/prior.py", line 78, in __init__
raise exc.MessageException(
autofit.exc.MessageException: nan parameter passed to UniformPrior
The __init__
contructor of the UniformPrior
where the MessageExcpetion
is raised indeed receives a nan
in its input params
:
class UniformPrior(WrappedInstance):
"""A prior with a uniform distribution between a lower and upper limit"""
def __init__(
self,
lower_limit=0.0,
upper_limit=1.0,
id_=None,
params=(0.0, 1.0)
):
if any(map(np.isnan, params)):
print(params)
stop
raise exc.MessageException(
"nan parameter passed to UniformPrior"
)
(4.651108532894578, nan)
I have printed the message in _new_for_base_message
:
ShiftedUniformNormalMessage(mean=47.74294961, sigma=0.34072531, lower_limit=0., upper_limit=100., log_norm=0.00096244)
ShiftedUniformNormalMessage(mean=2.88576815, sigma=1.18661, lower_limit=0., upper_limit=100., log_norm=0.00096244)
ShiftedUniformNormalMessage(mean=11.08058978, sigma=1.039327, lower_limit=0., upper_limit=25., log_norm=0.00096244)
ShiftedUniformNormalMessage(mean=44.63348162, sigma=0.52604417, lower_limit=0., upper_limit=100., log_norm=0.)
ShiftedUniformNormalMessage(mean=99.99983492, sigma=nan, lower_limit=0., upper_limit=100., log_norm=0.)
A nan crops up in the sigma
of one of the shifted uniform messages.
Ok, digging deeper I now realise this is probably not related to the recusion depth issue. It occurs specifically when the lower_limit=0.0
on a UniformPrior
, as the __truediv__
method in graphical is given two values of zero during the first calculation of messages during EP.
Will dig into recusion dpeth now...
Okay, even if the lower_limit=0.01
or lower_limit=1.0
after the first round of iterations I end up with the divison by zero and NaN error.
I guess, irrespective of the lower limit, it still ends up going to a lower limit of 0.0 and dividing by zero after the first message transform.
Okay so the recursion is definitely related to the nan's, its just it only leads to the recursion error on certain occations (or after the initial wave of factor fits are complete, e.g. before EP properly begins):
Traceback (most recent call last):
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 343, in update_factor_mean_field
factor_dist = self / cavity_dist
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 259, in __truediv__
{k: m / other.get(k, 1.0) for k, m in self.items()},
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 259, in <dictcomp>
{k: m / other.get(k, 1.0) for k, m in self.items()},
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/messages/transform_wrapper.py", line 104, in __truediv__
return self._new_for_base_message(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/prior/prior.py", line 59, in _new_for_base_message
return type(self)(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/prior/prior.py", line 78, in __init__
raise exc.MessageException(
autofit.exc.MessageException: nan parameter passed to UniformPrior
Traceback (most recent call last):
File "ep/graph/uniform_priors.py", line 119, in <module>
factor_graph_result = factor_graph.optimise(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/declarative/abstract.py", line 189, in optimise
updated_ep_mean_field = opt.run(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/expectation_propagation/optimiser.py", line 337, in run
model_approx, status = self.factor_step(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/expectation_propagation/optimiser.py", line 278, in factor_step
model_approx, status = optimiser.optimise(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 172, in optimise
new_model_dist, status = self.optimise_approx(factor_approx, **kwargs)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 119, in optimise_approx
state = self.prepare_state(factor_approx, mean_field, params)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 87, in prepare_state
hessian = self.make_hessian(mean_field, free_variables, **self.hessian_kws)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 15, in make_posdef_hessian
return MeanField.precision(mean_field, variables)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 204, in precision
variances = MeanField.variance.fget(self).subset(variables)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 192, in variance
return VariableData({v: dist.variance for v, dist in self.items()})
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 192, in <dictcomp>
return VariableData({v: dist.variance for v, dist in self.items()})
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/messages/transform_wrapper.py", line 129, in __getattr__
return getattr(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoConf/autoconf/tools/decorators.py", line 16, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/messages/transformed.py", line 126, in variance
Ok, so if I remove the cached_property
decorator from the variance
method I get this error:
Traceback (most recent call last):
File "ep/graph/uniform_priors.py", line 119, in <module>
factor_graph_result = factor_graph.optimise(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/declarative/abstract.py", line 189, in optimise
updated_ep_mean_field = opt.run(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/expectation_propagation/optimiser.py", line 337, in run
model_approx, status = self.factor_step(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/expectation_propagation/optimiser.py", line 278, in factor_step
model_approx, status = optimiser.optimise(
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 174, in optimise
new_model_dist, status = self.optimise_approx(factor_approx, **kwargs)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 121, in optimise_approx
state = self.prepare_state(factor_approx, mean_field, params)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 89, in prepare_state
hessian = self.make_hessian(mean_field, free_variables, **self.hessian_kws)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/laplace/optimiser.py", line 17, in make_posdef_hessian
return MeanField.precision(mean_field, variables)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/graphical/mean_field.py", line 205, in precision
return VariableFullOperator.from_diagonal(variances ** -1)
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/variable.py", line 292, in __op__
return cls({k: op(val, other) for k, val in self.items()})
File "/mnt/c/Users/Jammy/Code/PyAuto/PyAutoFit/autofit/mapper/variable.py", line 292, in <dictcomp>
return cls({k: op(val, other) for k, val in self.items()})
TypeError: unsupported operand type(s) for ** or pow(): 'method' and 'int'
For Victor's script, the problem currently is that only Gaussian priors work for message passing.
The script runs without error if we make the following replacements:
# model.r_2 = af.LogUniformPrior(1e-2,0.5)
# model.rho_2 = af.LogUniformPrior(lower_limit=10, upper_limit=200)
model.r_2 = af.GaussianPrior(mean=0.25, sigma=0.5, lower_limit=1e-2, upper_limit=0.5)
model.rho_2 = af.GaussianPrior(mean=100.0, sigma=100.0, lower_limit=10, upper_limit=200)
hierarchical_factor = af.HierarchicalFactor(
af.GaussianPrior,
# mean =af.UniformPrior(lower_limit=1e-2,upper_limit=0.5),
# sigma=af.UniformPrior(lower_limit=0.1,upper_limit=0.5),
mean=af.GaussianPrior(mean=0.25, sigma=0.1, lower_limit=1e-2, upper_limit=0.5),
sigma=af.GaussianPrior(mean=0.25, sigma=0.25, lower_limit=0.1, upper_limit=0.5),
)
When Rich work's next week we'll need to investigate the issues transforming Uniform and LogUniform priors.
I have no idea if using GaussianPriors is a sane idea for your problem or not -- I guess you can combine to a non EP fit to check if the inferred parameters and errors are comparable.
For Victor's script, the problem currently is that only Gaussian priors work for message passing.
The script runs without error if we make the following replacements:
# model.r_2 = af.LogUniformPrior(1e-2,0.5) # model.rho_2 = af.LogUniformPrior(lower_limit=10, upper_limit=200) model.r_2 = af.GaussianPrior(mean=0.25, sigma=0.5, lower_limit=1e-2, upper_limit=0.5) model.rho_2 = af.GaussianPrior(mean=100.0, sigma=100.0, lower_limit=10, upper_limit=200)
hierarchical_factor = af.HierarchicalFactor( af.GaussianPrior, # mean =af.UniformPrior(lower_limit=1e-2,upper_limit=0.5), # sigma=af.UniformPrior(lower_limit=0.1,upper_limit=0.5), mean=af.GaussianPrior(mean=0.25, sigma=0.1, lower_limit=1e-2, upper_limit=0.5), sigma=af.GaussianPrior(mean=0.25, sigma=0.25, lower_limit=0.1, upper_limit=0.5), )
When Rich work's next week we'll need to investigate the issues transforming Uniform and LogUniform priors.
I have no idea if using GaussianPriors is a sane idea for your problem or not -- I guess you can combine to a non EP fit to check if the inferred parameters and errors are comparable.
Thank you for looking into this! I think those parameters should work well with Gaussian priors, but I will check if posteriors are similar, as you suggested.
Division of one prior by another is really operating on the underlying distribution. So the uniform prior is actually a transformed transformed normal message: it's this message that arithmetic operates on. Sometimes the result of that operation is not well defined resulting in the nans. I'm not sure if we should have a different behaviour in this case @matthewghgriffiths ?
I've pushed a fix to master that resolves the recursion issue.
However, I am now seeing sigma going to -inf during the Laplace optimisation.
I get the following error when I try to fit a hierarchical model using EP:
This is the same non-converging issue I alluded to in #495 when using uniform priors. Now that the Gaussian priors remain within the bounds, I confirmed that it also happens for them. This must be an issue with EP, since hierarchically fitting the same model and data without EP, e.g.:
converges to a solution.