Open hsimonfroy opened 6 months ago
max_tree_depth
comparison.In the tested range of parameters, Hamiltonian trajectory lengths are almost always the maximal 2**max_tree_depth-1
, i.e. U-Turning is never reached.
num_samples
are adapted to max_tree_depth
such that each run required about 300,000 (+60,000 warmup) Hamiltonian steps, 3h wall time. Put differently, num_samples * (2**max_tree_depth-1)
$\simeq$ 300,000.
In the NumPyro's implementation of NUTS, number of evaluations of the model, its logprob, or its score, can be easily accessed. For step i
, NUTS makes extra_fields['num_steps'][i]
calls to value_and_grad()
, which relies on vjp
. JAX documentation states that
The FLOP cost for evaluating $(x, v) \mapsto (f(x), v^\mathsf{T} \partial f(x))$ is only about three times the cost of evaluating $f$.
Moreover, if the model contains deterministic variables, model may have to be replayed once for each sample, to evaluate these variables based on the samples. In NumPyro's, samples are postprocessed, and the replay_model
value is decided here.
Therefore, for a given NUTS run, the total number of model evaluations would be in the order of 3*extra_fields['num_steps'].sum()
if model does not contain deterministic variables, (3*extra_fields['num_steps']+1).sum()
otherwise.
Some comments on posterior samples, here 1560 samples obtained by NUTS (mtd=10).
Comparing between standardly and unstandardly parametrized models, sampled with NUTS (max_tree_depth=10
).
This is consistent with theory. Likelihood acts as a filter of the prior information. Standard parametrization makes the prior better conditioned, but depending on the likelihood, this can make the posterior better or worse conditioned.
(example from code)
See Betancourt blog for a another point of view, though it doesn't show differences between strong aligned and strong opposing likelihood.
Context
Documenting field-level explicit likelihood inference from a differentiable cosmological model.
In code, we run joint inferences of the initial field ($64^3$ mesh), cosmological parameters ($\Omega_c$ and $\sigma_8$), and Lagrangian bias parameters.
For one-chain samplers, chains are initialized on fiducial.
In order to assess chain convergences, so to observe a potential bias in the sampling process, we should limit the other sources of bias. For instance, observations should be generated with no likelihood noise. This doesn't solve that all, cf. a simple $\mathcal N(x|z^2,I)$ likelihood model. In any case, samplers will be compared with respect to a reference posterior obtained by an Implicit Likelihood Inference method.
Aims
Samplers to test: