Closed NikZak closed 3 years ago
Hi @NikZak, I believe this is simply a confusion in parametrization. As mentioned in the docstring, we use Nolan's "S0" coordinates by default, and in this parametrization "loc" is not necessarily the mean:
stability, skew, scale, loc = 1.9734820127487183, 0.22657334804534912, 0.009145896881818771, 0
d = dist.Stable(stability, skew, scale, loc)
print(f"true mean {d.mean:g}")
for i in range(10):
X = d.sample([2000000])
print(f"empirical mean {X.mean():g}")
stability, skew, scale, loc = 1.9734820127487183, 0.22657334804534912, 0.009145896881818771, 0
d = dist.Stable(stability, skew, scale, loc)
print(f"true mean {d.mean:g}")
for i in range(10):
X = d.sample([2000000])
print(f"empirical mean {X.mean():g}")
true mean 8.63668e-05
empirical mean 9.29452e-05
empirical mean 8.32608e-05
empirical mean 0.000101681
empirical mean 8.27871e-05
empirical mean 7.57022e-05
empirical mean 8.08945e-05
empirical mean 0.000105684
empirical mean 8.7159e-05
empirical mean 8.54344e-05
empirical mean 9.03941e-05
We use this "S0" parametrization by default in Pyro because we focus on gradient-based inference, and this "S0" parametrization is differentiable. Pyro provides an alternative non-differentiable parametrization "S" in which "loc" refers to the mean:
stability, skew, scale, loc = 1.9734820127487183, 0.22657334804534912, 0.009145896881818771, 0
d = dist.Stable(stability, skew, scale, loc, coords="S")
print(f"true mean {d.mean:g}")
for i in range(10):
X = d.sample([2000000])
print(f"empirical mean {X.mean():g}")
true mean 0
empirical mean 9.9177e-06
empirical mean 8.98913e-06
empirical mean 3.13308e-06
empirical mean -6.18303e-06
empirical mean 5.61574e-07
empirical mean 3.2153e-06
empirical mean -7.67794e-06
empirical mean -9.29578e-06
empirical mean 1.88068e-06
empirical mean -1.37749e-05
Thanks for trying out our Stable
implementation!
BTW we welcome PRs that clarify or improve docstrings, so feel free to clarify or add anything you think would have helped you avoid this confusion 🙂
Great thanks a lot fot the explanation. This makes a lot of sense now. Probably the major source of confusion came from the fact that parametrisation is different from the main wikipedia article on stable distributuion. I would probably also add the chapter number from Nolan's book and the analitical formula for the mean ( ) in this parametrisation to the docstring. I will try to make a PR for that.
While I have the occasion may I also ask why the params that converge from the procedure below
optim = ClippedAdam({"lr": 0.01, "lrd": 0.1 ** (1 / num_steps)})
svi = SVI(model, lambda: None, optim, EnergyDistance())
are so drastically different from pylevy module results fitted from the same data.
import scipy.stats as st, levy
levy.fit_levy(r)
In particular, the stability param: in pyro procedure I get ~1.97 while in pylevy I get ~1.49.
Would you possibly have a recommendation on the loss function or maybe optimisation algorithm that would make results closer?
Actually EnergyDistance hasn't seen much real-world use. You might instead try to combine the good old Trace_ELBO
with StableReparam:
from pyro import poutine
from pyro.infer.autoguide import AutoNormal
from pyro.infer.reparam import StableReparam
reparam_model = poutine.reparam(model,{"my_site_name": StableReparam()})
guide = AutoNormal(reparam_model)
svi = SVI(reparam_model, guide, optim, Trace_ELBO())
Happy to chat more, but now that we no longer believe this is a bug, I'd prefer to discuss at https://forum.pyro.ai, where our discussion is more visible.
I try the following code
I always get something substantially positive as a result (e.g. tensor(9.9695e-05)) and never below 5*10^-5 although the mean is supposed to be equal to location parameter which is 0.
This significantly affects my further code.
Am I doing something wrong?