nchopin / particles

Sequential Monte Carlo in python
MIT License
395 stars 74 forks source link

Defining conditional distributions #88

Closed glandfried closed 1 month ago

glandfried commented 1 month ago

Thank you very much, Nicolas Chopin, for the package. I have carefully read your book. SMC is exactly what I needed to approximate the model evidence.

Before starting to use the package, I compared the results of the particles package with my own implementation of the Bayesian linear (polynomial) regression model, which has analytical solutions for both the posterior and the evidence. They produced identical results. Amazing.

Since then, I have implemented several models. However, I encountered difficulties every time I tried to implement a conditional prior.

Example.

We want to compare two traditional models used for evaluating diagnostic tests without a gold standard: the Hui-Walter Model, which assumes independence between tests, and the Dendukuri Model, which includes the covariance between pairs of tests as an additional hypothesis.

Evaluating the performance of a diagnostic test is straightforward when we have gold standard data. In such cases, the probability of a diagnosis given the true state, $P(Diagnosis|State)$, equals their contingency table (in the long run).

table

However, when the true state of the patient is unknown, we must evaluate the alternative tests based solely on their imperfect diagnoses. The observable outcome of two tests can be represented by vector $r = (n_0,n_1,n_2,n_3)$, where $n_0$, $n_1$, $n_2$, and $n_3$ are the number of joint diagnoses $00$, $01$, $10$, and $11$ respectively.

The Hui-Walter model assumes that the sensitivity ($s$) and specificity ($x$) of the tests are independent of each other. Then, the probability of a joint diagnosis for an individual in a population with prevalence $p$ is a vector $q = (q_0, q_1, q_2, q_3)$, where the probability of two negative diagnoses is

$$q_0 = p\cdot(1-s_a)\cdot(1-s_b) + (1-p)\cdot x_a \cdot x_b$$

Probabilities $q_1$ to $q_3$ can be seen in the next figure, where we specify the model using the factor graph notation.

huiWalter

Implementing this models was straightforward. Because I decide to feed the model with one data point at a time, I replace the multinomial distribution by a categorical one.

# Simulated data
data = [503, 75, 187, 240]
data_expanded = np.concatenate(
    [[0]*data[0], [1]*data[1], [2]*data[2], [3]*data[3]])
random.shuffle(data_expanded)

prior = dists.StructDist({
"p":  dists.Beta(a=450, b=550), # prevalence
's1': dists.Beta(a=2., b=1.), # sensitivity test 1
'x1': dists.Beta(a=2., b=1.), # specificity test 1
's2': dists.Beta(a=2., b=1.), # sensitivity test 2
'x2': dists.Beta(a=2., b=1.), # specificity test 2
})

class HuiWalter(ssp.StaticModel):
    def logpyt(self, theta, t):  # density of Y_t given theta and Y_{0:t-1}
        n_particles = len(theta['p'])
        # q needs to be of the same length
        q = np.zeros((n_particles, 4))
        q[:, 0] = theta["p"] * (1 - theta['s1']) * (1 - theta['s2']) + (1 - theta["p"]) *      theta['x1']  *      theta['x2']
        q[:, 1] = theta["p"] * (1 - theta['s1']) *      theta['s2']  + (1 - theta["p"]) *      theta['x1']  * (1 - theta['x2'])
        q[:, 2] = theta["p"] *      theta['s1']  * (1 - theta['s2']) + (1 - theta["p"]) * (1 - theta['x1']) *      theta['x2']
        q[:, 3] = theta["p"] *      theta['s1']  *      theta['s2']  + (1 - theta["p"]) * (1 - theta['x1']) * (1 - theta['x2'])
        return dists.Categorical(q).logpdf(self.data[t])

K_chains = 20
N_particles = 2000

# Hui Walter
model = HuiWalter(data=data_expanded, prior=prior)
ibis = ssp.IBIS(model, len_chain=K_chains)
simulator = particles.SMC(fk=ibis, N=N_particles,
                store_history=True, verbose=True)
simulator.run()

In contrast, the Dendukuri model was proposed to evaluate sensitivity and specificity in the presence of conditional dependencies by incorporating the covariance between tests as an additional hypothesis. The following figure specify this model using the factor graph notation.

Dendukuiri

The implementation of the Dendukuiri model is similar to the Hui-Walter model, but we must add a conditional distribution over the covariance variables.

prior_dict_Dendukuri = OrderedDict()
prior_dict_Dendukuri['p'] = dists.Beta(a=450, b=550) # prevalence
prior_dict_Dendukuri['s1'] = dists.Beta(a=2., b=1.) # sensitivity test 1
prior_dict_Dendukuri['x1'] = dists.Beta(a=2., b=1.) # specificity test 1
prior_dict_Dendukuri['s2'] = dists.Beta(a=2., b=1.) # sensitivity test 2
prior_dict_Dendukuri['x2'] = dists.Beta(a=2., b=1.) # specificity test 2
prior_dict_Dendukuri['cs'] = dists.Cond(lambda theta: dists.Uniform(
    a= np.maximum(0, theta["s1"]+theta["s2"]-1 ) - theta["s1"]*theta["s2"],
    b= np.minimum(theta["s1"], theta["s2"]     ) - theta["s1"]*theta["s2"]))
prior_dict_Dendukuri['cx'] = dists.Cond(lambda theta: dists.Uniform(
    a= np.maximum(0, theta["x1"]+theta["x2"]-1 ) - theta["x1"]*theta["x2"],
    b= np.minimum(theta["x1"], theta["x2"]     ) - theta["x1"]*theta["x2"]))

priorDendukuri = dists.StructDist(prior_dict_Dendukuri)

class Dendukuri(ssp.StaticModel):
    def logpyt(self, theta, t):  # density of Y_t given theta and Y_{0:t-1}
        q = np.zeros((len(theta['p']), 4))
        q[:, 0] =      theta["p"]  * ((1 - theta['s1']) * (1 - theta['s2']) + theta["cs"]) \
                + (1 - theta["p"]) * (     theta['x1']  *      theta['x2']  + theta["cx"])
        q[:, 1] =      theta["p"]  * ((1 - theta['s1']) *      theta['s2']  - theta["cs"]) \
                + (1 - theta["p"]) * (     theta['x1']  * (1 - theta['x2']) - theta["cx"])
        q[:, 2] =      theta["p"]  * (     theta['s1']  * (1 - theta['s2']) - theta["cs"]) \
                + (1 - theta["p"]) * ((1 - theta['x1']) *      theta['x2']  - theta["cx"])
        q[:, 3] =      theta["p"]  * (     theta['s1']  *      theta['s2']  + theta["cs"]) \
                + (1 - theta["p"]) * ((1 - theta['x1']) * (1 - theta['x2']) + theta["cx"])
        return dists.Categorical(q).logpdf(self.data[t])

# Dendukuri
modelD = Dendukuri(data=data_expanded, prior=priorDendukuri)
ibisD = ssp.IBIS(modelD, len_chain=K_chains)
simulatorD = particles.SMC(fk=ibisD, N=N_particles ,
                store_history=True, verbose=True)
simulatorD.run()

This implementation runs, but encounters a problema. If we add the next code into the logpyt method of the class Dendukuiri,

for iq in range(len(q)):
    if any(q[iq,] < 0):
        print("Error, negative probability")
        print("q=",q[iq,])
        #
        print("p=",theta["p"][iq])
        print("s1=",theta["s1"][iq])
        print("s2=",theta["s2"][iq])
        print("cs=",theta["cs"][iq])
        #
        print("cs lower bound", np.maximum(0,theta["s1"][iq]+theta["s2"][iq]-1)-theta["s1"][iq]*theta["s2"][iq] )
        print("cs upper bound", np.minimum(theta["s1"][iq],theta["s2"][iq])-theta["s1"][iq]*theta["s2"][iq] )
        #
        print("x1=",theta["x1"][iq])
        print("x2=",theta["x2"][iq])
        print("cx=",theta["cx"][iq])
        #
        print("cx lower bound", np.maximum(0,theta["x1"][iq]+theta["x2"][iq]-1)-theta["x1"][iq]*theta["x2"][iq] )
        print("cx upper bound", np.minimum(theta["x1"][iq],theta["x2"][iq])-theta["x1"][iq]*theta["x2"][iq] )

we will see that there are negative probabilities.

q= [-0.02707828  0.35166766  0.47054394  0.20486668]
p= 0.432311682773959
s1= 0.48871788457064935
s2= 0.8883856128558985
cs= 0.045826255630601394
cs lower bound -0.05706643997138666
cs upper bound 0.05454794717271477
x1= 0.18241725774559292
x2= 0.69618036339598
cx= -0.2530503882090248
cx lower bound -0.12699531278702503
cx upper bound 0.05542194495856789

The problem is that the sampled covariance for the sensitivity cs and specificity cx lies outside their lower and upper bound. I had a similar problem with another model. I have not found sufficient examples in the documentation to understand how to resolve this error.

nchopin commented 1 month ago

Hi, That might be a bug. By `sampled covariance', you mean the empirical variance of the particle sample, which is used inside the (random walk) Metropolis step? This should (in principle) not be a problem: you may propose values outside the support of the distribution (i.e. outside the interval of a Uniform dist), but then these proposed values will be rejected because their prior density should be zero. So again maybe there is a bug in either Uniform of Cond. If you want, you can submit a MWR (=minimum working example), so that I can try to debug it.

glandfried commented 1 month ago

Github do not support sharing python files. I see three options:

As you prefer.

nchopin commented 1 month ago

I think 1 is best for everyone (but 2 works for me too if you prefer).

glandfried commented 1 month ago

Here is the full code.

from matplotlib import pyplot as plt
from collections import OrderedDict
import numpy as np
import random
from scipy import stats

import particles
from particles import distributions as dists
from particles import smc_samplers as ssp
from particles.collectors import Moments

# True hidden states.
trueState = OrderedDict()
trueState["prevalence"] = 0.45
trueState["sensitivity"] = [0.9, 0.6]
trueState["specificity"] = [0.95, 0.9]

# ############################ #
# bypassed simulated procedure #
################################

# Simulated data
data = [503, 75, 187, 240]
# Expanded data: one date point at each time step.
data_expanded = np.concatenate(
    [[0]*data[0], [1]*data[1], [2]*data[2], [3]*data[3]])
random.shuffle(data_expanded)

# Prior
prior = dists.StructDist({
    "p":  dists.Beta(a=450, b=550), # prevalence
    's1': dists.Beta(a=2., b=1.), # sensitivity test 1
    'x1': dists.Beta(a=2., b=1.), # specificity test 1
    's2': dists.Beta(a=2., b=1.), # sensitivity test 2
    'x2': dists.Beta(a=2., b=1.), # specificity test 2
})

class HuiWalter(ssp.StaticModel):
    def logpyt(self, theta, t):
        # Probability vector of joint diagnosis, q.
        q = np.zeros((len(theta['p']), 4))
        # 00
        q[:, 0] = theta["p"] * (1 - theta['s1']) * (1 - theta['s2']) + (1 - theta["p"]) *      theta['x1']  *      theta['x2']
        # 01
        q[:, 1] = theta["p"] * (1 - theta['s1']) *      theta['s2']  + (1 - theta["p"]) *      theta['x1']  * (1 - theta['x2'])
        # 10
        q[:, 2] = theta["p"] *      theta['s1']  * (1 - theta['s2']) + (1 - theta["p"]) * (1 - theta['x1']) *      theta['x2']
        # 11
        q[:, 3] = theta["p"] *      theta['s1']  *      theta['s2']  + (1 - theta["p"]) * (1 - theta['x1']) * (1 - theta['x2'])
        # log Evidence
        return dists.Categorical(q).logpdf(self.data[t])

prior_dict_Dendukuri = OrderedDict()
prior_dict_Dendukuri['p'] = dists.Beta(a=450, b=550) # prevalence
prior_dict_Dendukuri['s1'] = dists.Beta(a=2., b=1.) # sensitivity test 1
prior_dict_Dendukuri['x1'] = dists.Beta(a=2., b=1.) # specificity test 1
prior_dict_Dendukuri['s2'] = dists.Beta(a=2., b=1.) # sensitivity test 2
prior_dict_Dendukuri['x2'] = dists.Beta(a=2., b=1.) # specificity test 2
prior_dict_Dendukuri['cs'] = dists.Cond(lambda theta: dists.Uniform(
    a= np.maximum(0, theta["s1"]+theta["s2"]-1 ) - theta["s1"]*theta["s2"],
    b= np.minimum(theta["s1"], theta["s2"]     ) - theta["s1"]*theta["s2"]))
prior_dict_Dendukuri['cx'] = dists.Cond(lambda theta: dists.Uniform(
    a= np.maximum(0, theta["x1"]+theta["x2"]-1 ) - theta["x1"]*theta["x2"],
    b= np.minimum(theta["x1"], theta["x2"]     ) - theta["x1"]*theta["x2"]))

priorDendukuri = dists.StructDist(prior_dict_Dendukuri)

class Dendukuri(ssp.StaticModel):
    def logpyt(self, theta, t):
        # Probabilities of joint diagnosis, q.
        q = np.zeros((len(theta['p']), 4))
        # 00
        q[:, 0] =      theta["p"]  * ((1 - theta['s1']) * (1 - theta['s2']) + theta["cs"]) \
                + (1 - theta["p"]) * (     theta['x1']  *      theta['x2']  + theta["cx"])
        # 01
        q[:, 1] =      theta["p"]  * ((1 - theta['s1']) *      theta['s2']  - theta["cs"]) \
                + (1 - theta["p"]) * (     theta['x1']  * (1 - theta['x2']) - theta["cx"])
        # 10
        q[:, 2] =      theta["p"]  * (     theta['s1']  * (1 - theta['s2']) - theta["cs"]) \
                + (1 - theta["p"]) * ((1 - theta['x1']) *      theta['x2']  - theta["cx"])
        # 11
        q[:, 3] =      theta["p"]  * (     theta['s1']  *      theta['s2']  + theta["cs"]) \
                + (1 - theta["p"]) * ((1 - theta['x1']) * (1 - theta['x2']) + theta["cx"])
        # Debugging
        for iq in range(len(q)):
            if any(q[iq,] < 0):
                print("Error: we get particles outside the range.")
                print("q=",q[iq,])
                #
                print("Backward tracking the issue:")
                print("p=",theta["p"][iq])
                print("s1=",theta["s1"][iq])
                print("s2=",theta["s2"][iq])
                print("cs=",theta["cs"][iq])
                print("cs lower bound", np.maximum(0,theta["s1"][iq]+theta["s2"][iq]-1)-theta["s1"][iq]*theta["s2"][iq] )
                print("cs upper bound", np.minimum(theta["s1"][iq],theta["s2"][iq])-theta["s1"][iq]*theta["s2"][iq] )
                print("x1=",theta["x1"][iq])
                print("x2=",theta["x2"][iq])
                print("cx=",theta["cx"][iq])
                print("cx lower bound",  )
                print("cx upper bound", np.minimum(theta["x1"][iq],theta["x2"][iq])-theta["x1"][iq]*theta["x2"][iq] )
        # log evidence
        return dists.Categorical(q).logpdf(self.data[t])

K_chains = 20
N_particles = 2000

# Run Dendukuri
modelD = Dendukuri(data=data_expanded, prior=priorDendukuri)
ibisD = ssp.IBIS(modelD, len_chain=K_chains)
simulatorD = particles.SMC(fk=ibisD, N=N_particles ,
                store_history=True, verbose=True)
simulatorD.run()
print(np.exp(simulatorD.summaries.logLts[-1]/1005))

# Hui Walter
model = HuiWalter(data=data_expanded, prior=prior)
ibis = ssp.IBIS(model, len_chain=K_chains)
simulator = particles.SMC(fk=ibis, N=N_particles,
                store_history=True, verbose=True)
simulator.run()

print(simulator.logLt - simulatorD.logLt)
nchopin commented 1 month ago

One thing to keep in mind: the SMC sampler you are using (IBIS) rely (by default) on the following type of MCMC kernels: RW (random walk) Metropolis, with a covariance matrix for the RW proposal set to a fraction of the empirical covariance of the particles. In your case, this default strategy is not so great, since RW Metropolis proposes points potentially everywhere in R^d, whereas in your case the parameters lie in a constrained domain (because of the conditional uniform distributions). This is still valid, the points that are proposed outside the domain should simply be rejected, because their log-likelihood is simply minus infinity. But the acceptance rate may be quite low, which in turn may lead to higher variance for the final output. My current impression is that your code does work (it does not throw any error), it just that, by construction, it proposes parameter values that have prior probability =0, and that should simply get rejected. Ways to get around this problem:

  1. In your current implementation of logpyt, make sure your return -np.inf for each particle that is outside the prior domain.
  2. Re-parametrise the model, to make the parameters "unconstrained". For instance, you could replace a parameter xi which is a priori Uniform(a, b) by a parameter eta = f(xi) = 1 / (1 + exp[ (xi-a) / b -a ]). The distribution of y_t will still depend on xi, but which is now obtained as f^{-1}(eta). (In your case, this is a bit tricky, because a and b depends on the other parameters, but it's still doable.)
  3. Implement your own MCMC kernels to make sure that the parameters are proposed in the constrained space.

I think I would try 1 then maybe 2, but 3 looks overkill.

nchopin commented 1 month ago

Closing this, not a bug, hope my explanation was clear.