nimble-dev / nimbleHMC

HMC and other derivative-based MCMC sampling algorithms for the nimble package
GNU General Public License v3.0
12 stars 1 forks source link

Claim in JOSS paper of distinctness of ability to assign samplers to particular parameters needs clarifying #61

Closed matt-graham closed 3 months ago

matt-graham commented 3 months ago

Raising as part of JOSS review openjournals/joss-reviews/issues/6745

The claim in

https://github.com/nimble-dev/nimbleHMC/blob/8ca5c0a156d0f446d9dec328496c5c5d74638ce5/joss/paper/paper.md?plain=1#L74-L80

is not necessarily true as stated, as to my knowledge there are multiple other MCMC software packages which allow 'easy customization of sampler assignments from a high-level interface'

For example in PyMC the step argument to the pymc.sample function can be used to assign a list of sampler step methods each of which targets a different subset of random variables in the associated model - the linked documentation gives the example

step=[pm.NUTS([freeRV1, freeRV2], target_accept=0.9),
      pm.BinaryGibbsMetropolis([freeRV3], transit_p=.7)]

for a model with two continuous random variables freeRV1 and freeRV2 and one discrete (binary) random variable freeRV3. It might be that something else is intended by the statement 'easy customization of sampler assignments from a high-level interface' but if so the claim needs clarifying.

Similiary the Turing.jl documentation has an example of combining HMC on the continuous parameters and particle Gibbs on the discrete parameters of a hidden Markov model.

A similar claim is also made in

https://github.com/nimble-dev/nimbleHMC/blob/8ca5c0a156d0f446d9dec328496c5c5d74638ce5/joss/paper/paper.md?plain=1#L103-L105

As mentioned above both PyMC and Turing.jl do allow MCMC based inference in models with both discrete and continuous parameters / latent variables. There are also approaches for sampling from some such models in probabilistic programming frameworks such as Stan which only directly support sampling from distribution on real-valued vector spaces, namely by marginalizing out the discrete latent parameters. As I think the model here is a set of hidden Markov models with shared (continuous) parameters, the marginalization could be performed efficiently using the forward algorithm.

Concretely I think the following PyMC code is roughly equivalent (beyond some ambiguity over the priors to use for the initial states in each hidden sequence #59 and possibly some errors in translating between zero and one based indexing) to the nimbleHMC example in the paper

import numpy as np
import pymc as pm

y_observed = np.loadtxt("data.csv", delimiter=",")
N, T = y_observed.shape
f = np.array([0, 1, 1, 0, 0, 0, 0, 0])
first = y_observed.argmax(axis=1)

with pm.Model() as model:
    phi = pm.Uniform("phi", lower=0, upper=1, shape=(2,))
    p = pm.Uniform("p", lower=0, upper=1)
    for i in range(N):
        x_i, y_i = [], []
        for t in range(first[i] + 1, T):
            x_i.append(pm.Bernoulli(f"x_{i}_{t}", p=phi[f[t]] * x_i[-1] if t != first[i] + 1 else 1))
            y_i.append(pm.Bernoulli(f"y_{i}_{t}", p=p * x_i[-1], observed=y_observed[i, t]))
    pm.sample(draws=20000, tune=10000, chains=1)

where data.csv is the output of

write.table(do.call("cbind", y), file="data.csv", col.names = FALSE, row.names = FALSE, sep=",")

with y the list of lists assigned in the example in the paper.

This runs with the automatic step method / sampler assignment

CompoundStep
>NUTS: [phi, p]
>BinaryGibbsMetropolis: [x_0_1, x_0_2, x_0_3, ..., x_252_6, x_253_6, x_254_6]

albeit very slowly due to the unrolled nested loops creating a very large computation graph.

danielturek commented 3 months ago

Addressed in #65

danielturek commented 3 months ago

@matt-graham Thanks for pointing this out, this was fascinating to read!

I updated the manuscript. I chose to not attempt a review of capabilities of other packages, but rather to remove any claims of uniqueness (I thought this was safer). The manuscript now only describes the functionality of the package, rather than making claims of uniqueness among software. Could you take a look at the updated text in #65, and see if this resolves the issue?

matt-graham commented 3 months ago

Thanks @danielturek - the rephrasing to remove claim of uniqueness looks good to me.

danielturek commented 3 months ago

Thanks one more time, @matt-graham . I really appreciate your helpful comments.

65 is now merged. Closing this issue.