nutterb / HydeNet

Hybrid Decision Networks in R
Other
23 stars 3 forks source link

What is the correct `cpt` model specification #70

Closed nutterb closed 9 years ago

nutterb commented 9 years ago
 Net <- HydeNetwork(~ wells + 
                      pe | wells + 
                      d.dimer | pregnant*pe + 
                      angio | pe + 
                      treat | d.dimer*angio + 
                      death | pe*treat,
                      data = PE) 

If we use the above model, the variables angio and death get fit with cpt because they are both factor variables.

I get the following output from writeNetworkModel as currently written.

writeNetworkModel(Net, TRUE)

model{
   wells ~ dnorm(3.7942, 0.6305)
   pe ~ dbern(ilogit(-3.90355 + 0.5757 * wells))
   d.dimer ~ dnorm(210.24251 + 68.37938 * (pe == 2) + 29.29496 * (pregnant == 2), 0.03347)
   pi.pregnant[1] <- 0.9014; pi.pregnant[2] <- 0.0986
   pregnant ~ dcat(pi.pregnant)
   pi.angio <- cpt.angio[pe,]
   angio ~ dcat(pi.angio)
   treat ~ dbern(ilogit(-5.89316 + 1.73354 * (angio == 2) + 0.01994 * d.dimer))
   pi.death <- cpt.death[pe,treat,]
   death ~ dcat(pi.death)
}

This creates and error. in the deterministic defintion of pi.angio and pi.death, JAGS wants the values in cpt.death[pe, treat,] to be probability vectors. But those are both single element probability values returned by the logistic regression. So here's the question, should I write the probability vectors based on the observed values in the cpt object, or should I do something crazy like pi.pe[1] <- (1 - pe); pi.pe[2] <- pe, which would allow the input probabilities to be based on the posterior value from pe? It seems the latter is what I should do, but wanted to know what you think.

I feel like we're so so so close.

jarrod-dalton commented 9 years ago

Why are there 2 dimensions in cpt.angio[pe, ]? I would have thought this to be a uni-dimensional vector cpt.angio[pe].

Same thing for cpt.death[pe,treat,] (3 vs. 2 dimensions)...

jarrod-dalton commented 9 years ago

I think this is because the last dimension corresponds to the child node. Brain is foggy as I haven't looked at it in a week or so.

nutterb commented 9 years ago

Yup, that's why. It's okay to be foggy still.

jarrod-dalton commented 9 years ago

why is logistic regression used in pi.angio and pi.death?

If I am understanding correctly, the issue is that cpt will always give tables that represent all values of the child node, where this is not the case for situations in which logistic regression is used.

If I further understand correctly, JAGS is trying to call dcat with single-element probabilities, and doesn't know what to do with them. Would calling dbern instead be an option? That is, does dbern know what you're talking about if you only give it one probability value? If it is, we would maybe need to build logic in which detects situations where the child node has only two values.

nutterb commented 9 years ago

So you're suggesting I try

pi.angio[dbern(pe), ] ?

I'll give that a shot and see what happens.

jarrod-dalton commented 9 years ago

pi.angio <- cpt.angio[pe,] angio ~ dbern(pi.angio)

???

nutterb commented 9 years ago

It looks like I'm going to have to create the vector in the same way I created the vectors for multinomial logistic regression.

model{
   pi.gear[1] <- 1 - (exp(-0.32241 + 10.17042*(am == 2) + -0.52456*(cyl == 2) + -12.63344*(cyl == 3) + 0.04434*mpg) / (1 + exp(-0.32241 + 10.17042*(am == 2) + -0.52456*(cyl == 2) + -12.63344*(cyl == 3) + 0.04434*mpg) + exp(-9.41827 + 18.83984*(am == 2) + -0.29636*(cyl == 2) + -0.64508*(cyl == 3) + 0.02031*mpg)) + exp(-9.41827 + 18.83984*(am == 2) + -0.29636*(cyl == 2) + -0.64508*(cyl == 3) + 0.02031*mpg) / (1 + exp(-0.32241 + 10.17042*(am == 2) + -0.52456*(cyl == 2) + -12.63344*(cyl == 3) + 0.04434*mpg) + exp(-9.41827 + 18.83984*(am == 2) + -0.29636*(cyl == 2) + -0.64508*(cyl == 3) + 0.02031*mpg))); pi.gear[2] <- exp(-0.32241 + 10.17042*(am == 2) + -0.52456*(cyl == 2) + -12.63344*(cyl == 3) + 0.04434*mpg) / (1 + exp(-0.32241 + 10.17042*(am == 2) + -0.52456*(cyl == 2) + -12.63344*(cyl == 3) + 0.04434*mpg) + exp(-9.41827 + 18.83984*(am == 2) + -0.29636*(cyl == 2) + -0.64508*(cyl == 3) + 0.02031*mpg)); pi.gear[3] <- exp(-9.41827 + 18.83984*(am == 2) + -0.29636*(cyl == 2) + -0.64508*(cyl == 3) + 0.02031*mpg) / (1 + exp(-0.32241 + 10.17042*(am == 2) + -0.52456*(cyl == 2) + -12.63344*(cyl == 3) + 0.04434*mpg) + exp(-9.41827 + 18.83984*(am == 2) + -0.29636*(cyl == 2) + -0.64508*(cyl == 3) + 0.02031*mpg))
   gear ~ dcat(pi.gear)
   pi.am[1] <- 0.59375; pi.am[2] <- 0.40625
   am ~ dcat(pi.am)
   pi.cyl[1] <- 0.34375; pi.cyl[2] <- 0.21875; pi.cyl[3] <- 0.4375
   cyl ~ dcat(pi.cyl)
   mpg ~ dnorm(20.09062, 0.16592)
}
jarrod-dalton commented 9 years ago

I feel like that may work but is not the best solution. Somehow we should be able to feed the probability array to JAGS and use either dcat or dbern with that array.

When I simply run cpt(angio ~ pe, data=PE), I get all the numbers needed:

     angio
pe     Negative   Positive
  No  0.9025465 0.09745348
  Yes 0.2576419 0.74235808

I noticed that, in the above HydeNetwork(..., data=PE) call, when cpt is used to set distributions for pe and death, the network$nodeModel list element is NULL.

I think this may have repercussions in the portion of compileJagsModel() which works with cpt_arrays?

nutterb commented 9 years ago

using cpt(angio ~ pe, data = PE) certainly works for the case where pe has no parents. This gives a static probability model.

However, when we estimate pe from its own parents, a static probability model for angio doesn't seem appropriate. It seems we would want to change the probability inputs into angio based on the estimate of pe. Am I correct?