reshape momentum inside the integrator?

There is a tension inside the HMC algorithm between the momentum generator, which generates 1D arrays, and the log-probability functions which takes tuples of array. Since the integrator step involves computations that mix for, we either need to work with flattened variables or reshape the momentum and use jax.flatten_util.ravel_pytree to obtain an unraveling function. Here are the tradeoffs:

Flattening variables

This can be easily achieved by composing the unraveling function with the logpdf. We can then use variables as a concatenated 1D Array.

This leads to the following extra operations:

Unraveling in the flattened logpdf during trajectory integration: N_samples * num_stages_integrator * num_integration_steps;
Unraveling in the flattened logpdf when computing the logprob before the acceptance step: N_samples;
N_samples unraveling steps to re-create the trace.

So approximately:

N_samples * num_stages_integrator * num_integration_steps

Pros

The inference code only deals with 1D Arrays. In a way it is cleaner.

Cons

It may make extensions such as discontinuous HMC and Metropolis-within-Gibbs more complex down the line as we would have to carry the shape of each variable around.
It adds extra steps for users who lay want to use the inference engine with custom logpdfs; it makes the user interface complicated.
We need to pass an unraveling function to the integrator steps; it is harder to justify as it is far from the source of the issue.

Unraveling momentum during inference

This can be done at several stages. Either in the momentum generator (like Numpyro) or the integrator. It seems kind of awkward to do it in the momentum generator: it becomes disconnected from the place it is actually needed and will look confusing to someone first encountering the code.

It would lead to the following extra operations:

Unraveling in the momentum generation: N_iter
Raveling in the kinetic energy: N_samples * num_stages_integrator * num_integration_steps
Raveling/unraveling with jax.tree_multimap: same order of magnitude.

Pro

Transformation is done at a place where we see the actual data. No need to add and extra step to guess the shape at initialization.
Advanced users only have to worry about providing an unraveling function, which is easier to do and to understand.

Cons

One op to generate the momentum, one op to compute the kinetic energy
num_stages ravelling / unravelling operations per integration step (cf JAX' tree_multimap implementation
We need to replace all operations in the integrators with tree_multimap ops. Slightly more verbose.

rlouf / mcx

reshape momentum inside the integrator? #7

Flattening variables

Unraveling momentum during inference