Factor levels for nodes not learned from data

Maybe we have this, but we need a mechanism for specifying factor levels for nodes manually constructed.

In the below example, I create a network for coronary artery disease that doesn't seem to be updating probabilities of disease correctly in light of a sequence of positive diagnostic tests:

require(HydeNet)

g <- HydeNetwork(
  ~ cad  | age*female*chestPain
  + echo | cad
  + stressECG | cad
  + cta  | cad
)

  g <- setNode(g, female,            "dbern", p = 0.40)
  g <- setNode(g, age,     "dnorm", mu =  60, tau = 1/(10^2))

  # prior probability of chest pain etiologies
  #1 = non-specific, 2 = atypical, 3 = typical
  g <- setNode(g, chestPain,
               nodeType = "dcat",
               pi = "pi.chestPain[1] <- 0.44; pi.chestPain[2] <- 0.40; pi.chestPain[3] <- 0.16;")

  # CAD -- Genders et al. (2011) European Heart Journal 32, 1316–1330
  equation <- "-4.37 + 0.64*(chestPain == 2) + 1.91*(chestPain == 3) + 1.34*(female == 0) + 0.04*age"
  g <- setNode(g, cad, nodeType="dbern",
                 p=paste("ilogit(", equation, ")"), 
                 validate=FALSE)

  ecgdata <- cbind(expand.grid(list(stressECG = c("Negative", "Positive"),
                                    cad = c("No","Yes"))),
                   data.frame(pr=c(0.77,0.23,0.32,0.68)))
  ecg.cpt <- cpt(stressECG ~ cad, data=ecgdata, wt=ecgdata$pr)
  g <- setNodeModels(g, ecg.cpt)

  echodata <- cbind(expand.grid(list(echo = c("Negative", "Positive"),
                                    cad = c("No","Yes"))),
                   data.frame(pr=c(0.83,0.17,0.12,0.88)))
  echo.cpt <- cpt(echo ~ cad, data=echodata, wt=echodata$pr)
  g <- setNodeModels(g, echo.cpt)

  ctadata <- cbind(expand.grid(list(cta = c("Negative", "Positive"),
                                     cad = c("No","Yes"))),
                    data.frame(pr=c(0.86,0.14,0.05,0.95)))
  cta.cpt <- cpt(cta ~ cad, data=ctadata, wt=ctadata$pr)
  g <- setNodeModels(g, cta.cpt)

ff <- function(data){
  data$female    <- factor(data$female,    0:1, c("No","Yes"))
  data$chestPain <- factor(data$chestPain, 1:3, c("Non-specific","Atypical","Typical"))
  return(data)
}
pCAD <- function(data) sum(data$cad=="Yes") / nrow(data)

writeNetworkModel(g, pretty=TRUE)

#plot(g)

monitored <- c("age","female","chestPain","cad", "echo", "stressECG", "cta")
nsamp <- 100000

# cg00 <- compileJagsModel(g)
# post00 <- ff(bindPosterior(HydePosterior(cg, variable.names = monitored, n.iter = nsamp)))

cg10   <- compileJagsModel(g, list(age=35, female=1, chestPain=1))
post10 <- ff(bindPosterior(HydePosterior(cg10, variable.names=monitored, nchain = 4, n.iter=nsamp)))

cg11   <- compileJagsModel(g, list(age=35, female=1, chestPain=1, echo="Positive"))
post11 <- ff(bindPosterior(HydePosterior(cg11, variable.names=monitored, n.iter=nsamp)))

cg12   <- compileJagsModel(g, list(age=35, female=1, chestPain=1, echo="Positive", stressECG="Positive"))
post12 <- ff(bindPosterior(HydePosterior(cg12, variable.names=monitored, n.iter=nsamp)))

cg13   <- compileJagsModel(g, list(age=35, female=1, chestPain=1, echo="Positive", stressECG="Positive", cta="Positive"))
post13 <- ff(bindPosterior(HydePosterior(cg13, variable.names=monitored, n.iter=nsamp)))

lapply(list(post10, post11, post12, post13), pCAD)

[[1]]
[1] 0.04869

[[2]]
[1] 0.04872

[[3]]
[1] 0.0496

[[4]]
[1] 0.04825

It turns out that the issue is that HydeNet had no idea of what I was talking about with cad = c("No","Yes"). But when I tried to use numeric values for nodecad`, it yelled at me:

  ecgdata <- cbind(expand.grid(list(stressECG = c("Negative", "Positive"),
                                    cad = 0:1)),
                   data.frame(pr=c(0.77,0.23,0.32,0.68)))
  ecg.cpt <- cpt(stressECG ~ cad, data=ecgdata, wt=ecgdata$pr)

Error in cpt_workhorse(variables, dependentVar, independentVars, data,  : 

1: All variables must be of class 'factor'

How can I set the factor levels for a node (cad) which is defined manually, in this case using a logistic regression equation?

I'm having trouble following. It looks to me like HydeNet is recognizing "Yes", "No", 0, and 1 as all valid values by which to interpret cad (try adding cad=1 or cad="Yes" to any of those compileJagsNetwork calls).

If I understand correctly, though, the real issue is that a 35 year old woman with non-specific chest pain and positive echo, stressECG and ecg should have a higher probability of cad than 0.04825? And should it really? The age, gender, and type of chest pain are all low risk descriptors. (though I'm surprised it is that low with that many positive tests).

Not entirely related, but using factor(0:1) makes this call work, and then HydeNet would recognize the values 0, 1, "0", and "1" as valid identifiers of cad.

> ecgdata <- cbind(expand.grid(list(stressECG = c("Negative", "Positive"),
+                                     cad = factor(0:1))),
+                    data.frame(pr=c(0.77,0.23,0.32,0.68)))
>   ecg.cpt <- cpt(stressECG ~ cad, data=ecgdata, wt=ecgdata$pr)
> ecg.cpt
   stressECG
cad Negative Positive
  0     0.77     0.23
  1     0.32     0.68

Below are the "right" answers. I tried using factor(0:1) too, but in this case the cad node still is numeric and doesn't map to the values "0" and "1".

require(HydeNet)

g <- HydeNetwork(
  ~ cad  | age*female*chestPain
  + echo | cad
  + stressECG | cad
  + cta  | cad
)

  g <- setNode(g, female, "dbern", p = 0.40)
  g <- setNode(g, age,    "dnorm", mu =  60, tau = 1/(10^2))

  # prior probability of chest pain etiologies
  # 1 = non-specific, 2 = atypical, 3 = typical
  g <- setNode(g, chestPain,
               nodeType = "dcat",
               pi = "pi.chestPain[1] <- 0.44; pi.chestPain[2] <- 0.40; pi.chestPain[3] <- 0.16;")

  # CAD -- Genders et al. (2011) European Heart Journal 32, 1316–1330
  equation <- "-4.37 + 0.64*(chestPain == 2) + 1.91*(chestPain == 3) + 1.34*(female == 0) + 0.04*age"
  g <- setNode(g, cad, nodeType="dbern",p=paste("ilogit(", equation, ")"), validate=FALSE)

  g <- setNode(g, stressECG, "dbern", p="ifelse(cad==1,0.68,1-0.77)")
  g <- setNode(g, echo,      "dbern", p="ifelse(cad==1,0.88,1-0.83)")
  g <- setNode(g, cta,       "dbern", p="ifelse(cad==1,0.95,1-0.86)")

writeNetworkModel(g, pretty=TRUE)

plot(g)

ff <- function(data){
  data$female    <- factor(data$female,    0:1, c("No","Yes"))
  data$chestPain <- factor(data$chestPain, 1:3, c("Non-specific","Atypical","Typical"))
  return(data)
}
fp <- function(obj){ff(bindPosterior(HydePosterior(obj, variable.names=monitored, n.iter=nsamp)))}

monitored <- c("age","female","chestPain","cad", "echo", "stressECG", "cta")
nsamp <- 100000

# cg00 <- compileJagsModel(g)
# post00 <- ff(bindPosterior(HydePosterior(cg, variable.names = monitored, n.iter = nsamp)))

post10 <- fp(compileJagsModel(g, list(age=35, female=1, chestPain=1)))
post11 <- fp(compileJagsModel(g, list(age=35, female=1, chestPain=1, echo=1)))
post12 <- fp(compileJagsModel(g, list(age=35, female=1, chestPain=1, echo=1, stressECG=1)))
post13 <- fp(compileJagsModel(g, list(age=35, female=1, chestPain=1, echo=1, stressECG=1, cta=1)))
unlist(lapply(list(post10, post11, post12, post13), function(data) mean(data$cad)))

[1] 0.04853 0.20880 0.44018 0.84346

Well, I'm pretty sure I know what's happening. but I'm really not sure what to do about it.

> model <- "
+ model{
+    cad ~ dbern(ilogit( -4.37 + 0.64*(chestPain == 2) + 1.91*(chestPain == 3) + 1.34*(female == 0) + 0.04*age ))
+ age ~ dnorm(60, 0.01)
+ female ~ dbern(0.4)
+ pi.chestPain[1] <- 0.44; pi.chestPain[2] <- 0.40; pi.chestPain[3] <- 0.16;
+ chestPain ~ dcat(pi.chestPain)
+ }"
> 
> x <- jags.model(textConnection(model), data = list(age=35, female=1, chestPain=1))
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 29

Initializing model

> y <- coda.samples(x, variable.names = c("cad"), n.iter = 2000)
  |**************************************************| 100%
> prop.table(table(y[[1]][, 1]))

    0     1 
0.947 0.053 
> 
> 
> x <- jags.model(textConnection(model), data = list(age=35, female=1, chestPain=1))
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 29

Initializing model

> y <- coda.samples(x, variable.names = c("cad"), n.iter = 100000)
  |**************************************************| 100%
> prop.table(table(y[[1]][, 1]))

      0       1 
0.95045 0.04955 
> 
> x <- jags.model(textConnection(model), data = list(age=35, female=1, chestPain=1))
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 29

Initializing model

> y <- coda.samples(x, variable.names = c("cad"), n.iter = 100000)
  |**************************************************| 100%
> prop.table(table(y[[1]][, 1]))

      0       1 
0.94993 0.05007

As far as I can understand, what's happening is cad is you're always getting the probability of cad as calculated by cad's parents. It seems like cad is being calculated prior to the conditional probability tables being referenced for calculating cad after observing one of its children. This isn't an issue with the ifelse code for some reason. And I'm not really sure why. I"m going to have to do a bunch of research into this.

Let's do a phone call to talk about this

On Fri, Jul 31, 2015 at 12:44 PM, Benjamin notifications@github.com wrote:

Well, I'm pretty sure I know what's happening. but I'm really not sure what to do about it.

model <- "

model{

cad ~ dbern(ilogit( -4.37 + 0.64(chestPain == 2) + 1.91(chestPain == 3) + 1.34_(female == 0) + 0.04_age ))

age ~ dnorm(60, 0.01)

female ~ dbern(0.4)

pi.chestPain[1] <- 0.44; pi.chestPain[2] <- 0.40; pi.chestPain[3] <- 0.16;

chestPain ~ dcat(pi.chestPain)

}"

x <- jags.model(textConnection(model), data = list(age=35, female=1, chestPain=1)) Compiling model graph Resolving undeclared variables Allocating nodes Graph Size: 29

Initializing model

y <- coda.samples(x, variable.names = c("cad"), n.iter = 2000) |**| 100% prop.table(table(y[[1]][, 1]))
0     1
0.947 0.053

x <- jags.model(textConnection(model), data = list(age=35, female=1, chestPain=1)) Compiling model graph Resolving undeclared variables Allocating nodes Graph Size: 29

Initializing model

y <- coda.samples(x, variable.names = c("cad"), n.iter = 100000) |**| 100% prop.table(table(y[[1]][, 1]))
  0       1
0.95045 0.04955

x <- jags.model(textConnection(model), data = list(age=35, female=1, chestPain=1)) Compiling model graph Resolving undeclared variables Allocating nodes Graph Size: 29

Initializing model

y <- coda.samples(x, variable.names = c("cad"), n.iter = 100000) |**| 100% prop.table(table(y[[1]][, 1]))
  0       1
0.94993 0.05007

As far as I can understand, what's happening is cad is you're always getting the probability of cad as calculated by cad's parents. It seems like cad is being calculated prior to the conditional probability tables being referenced for calculating cad after observing one of its children. This isn't an issue with the ifelse code for some reason. And I'm not really sure why. I"m going to have to do a bunch of research into this.

— Reply to this email directly or view it on GitHub https://github.com/nutterb/HydeNet/issues/81#issuecomment-126748882.

I think I have it. It looks as simple as stripping the dimnames attribute off of the cpt objects before passing them to jags.model.

Okay, it was something completely different. It seems I was recalculating the cpt object from the data, but not taking into account the weights. Thus, all of the CPTs were showing equal probability of any classification (or a non-informative prior).

I've changed the code to only recalculate if there is no cpt object available, and if it does, it takes into consideration any weights given in the nodeFitterArgs. Try installing from the devel branch and let me know if that works.

I'm still not sure this issue has been resolved. Let me restate it, hopefully more clearly. I want to specify a network without training data, i.e., completely from scratch. I have one continuous variable (Age) and three factor variables:

variable	levels
Male	No, Yes
ChestPainType	Typical, Atypical, Non-Specific
CAD	No, Yes

Below is some code to set up the network. How do I tell HydeNet what the factor levels are (preferably up front)?

> library(HydeNet)
> 
> g <- HydeNetwork(~ CAD | Age*Male*ChestPainType)
> 
> 
> #Priors on age, sex, and type of chest pain
> 
> g <- setNode(g, node = Age, nodeType = "dnorm",
+              mu = 60, tau = 1/(10^2))
> 
> g <- setNode(g, node = Male, nodeType = "dbern", p = 0.65)
> 
> g <- setNode(g, node = ChestPainType, nodeType = "dcat",
+              pi = vectorProbs(p = c(0.30, 0.25, 0.45), ChestPainType),
+              validate = FALSE)
> 
> #Genders et al. pre-test risk equation for CAD:
> RiskEqn =
+     paste('-4.37 + 0.04*Age + 1.34*Male',
+           '+ 1.91*equals(ChestPainType, "Typical")',
+           '+ 0.65*equals(ChestPainType, "Atypical")')
> 
> g <- setNode(g, node = CAD, nodeType = "dbern",
+              p=paste("ilogit(", RiskEqn, ")"), 
+              validate=FALSE)
> 
> 
> writeNetworkModel(g, pretty=TRUE)

model{
   CAD ~ dbern(ilogit( -4.37 + 0.04*Age + 1.34*Male + 1.91*equals(ChestPainType, "Typical") + 0.65*equals(ChestPainType, "Atypical") ))
   Age ~ dnorm(60, 0.01)
   Male ~ dbern(0.65)
   pi.ChestPainType[1] <- 0.3; pi.ChestPainType[2] <- 0.25; pi.ChestPainType[3] <- 0.45
ChestPainType ~ dcat(pi.ChestPainType)

I think that all it may require is a little bit of added functionality surrounding the creation of the factorRef list element within a compiledHydeNetwork object.

(Aside: I noticed that several of our help pages lack detail on the value returned by the functions.)

I see that compileJagsModel() creates the factorRef list element by searching the network for "factor" objects (I didn't look in detail as to how that is distinguished), and calling makeFactorRef via dataframeFactors.

So a little function of the flavor nodeFactorLevels(network, node, codes, levels) would seem to do the trick. Or, more elegantly, creative use of the base function factor().

Or, to avoid having another function, it might be best to create another argument called factorLevels (with appropriate logic to ensure that nodeType is either dbern or dcat) within setNode. This argument would be a list containing two elements (of equal length), values and labels.

What do you think?

To go along with the above, some thought on defaults and a helper function:

1) it would be helpful to feed simply a vector like c('Typical', 'Atypical', 'Non-Specific') to the factorLevels argument, under the assumption that the underlying codes fed to JAGS (which can actually be invisible to the user at this point) follow JAGS convention: 1, 2, 3, etc.

2) some sort of utility function to reduce typing for simple c("No","Yes") situations. Maybe this is not a workable idea, but I'll throw it out there: I would love to simply type something like: setNode(net, node = Male, nodeType = binaryNode(p = 0.57)) and have it do the work for me.

So it turns out there's a pretty simple hack to trick HydeNet into recognizing factor levels, so if you need this immediately, you can pull this off.

The following code is fully functional. Notice that I create a data frame ChestPainLevels that has a single vector of length three specifying the levels I want in that factor. I then tuck that data frame into the nodeData argument and HydeNet purrs happily.

All that said, this is a wretched long term solution, so see the next comment for what I intend to do about it.

library(HydeNet)

g <- HydeNetwork(~ CAD | Age*Male*ChestPainType)

#Priors on age, sex, and type of chest pain

g <- setNode(g, node = Age, nodeType = "dnorm",
             mu = 60, tau = 1/(10^2))
g <- setNode(g, node = Male, nodeType = "dbern", p = 0.65)

#* Create a trivial data set just to get the levels of the factor
ChestPainLevels <- 
  data.frame(ChestPainType = factor(c("Typical", "Atypical", "Non-Specific"),
                                    c("Typical", "Atypical", "Non-Specific")))
g <- setNode(g, node = ChestPainType, nodeType = "dcat",
              pi = vectorProbs(p = c(0.30, 0.25, 0.45), ChestPainType),
              validate = FALSE,
             nodeData = ChestPainLevels)

#Genders et al. pre-test risk equation for CAD:
RiskEqn =
     paste('-4.37 + 0.04*Age + 1.34*Male',
           '+ 1.91*equals(ChestPainType, 1)',
           '+ 0.65*equals(ChestPainType, 2)')

g <- setNode(g, node = CAD, nodeType = "dbern",
            p=paste("ilogit(", RiskEqn, ")"), 
            validate=FALSE)

writeNetworkModel(g, pretty=TRUE)

cg <- compileJagsModel(g)

cg$factorRef

For a more robust, and, coincidentally, less computationally intensive approach, I will do this:

Include in setNode a factorLevels argument. This argument will only take effect under the following conditions:

The nodeType is dbern or dcat
The nodeFitter is not cpt (I'm still contemplating the downstream consequences of this, but if the nodeFitter is cpt, then the factor levels should be extracted from the cpt object).
The nodeData object is NULL
The data element does not contain a variable named for the node.

That's a long way of saying that we give preference to the level factors existing in the data before we will accept it from the user. I think this is important because if we allow differing levels from the data, the pieces of the puzzle won't fit any longer.

Once we've passed all those conditions, we'll use the user supplied factor levels in the factorLevels attribute of the network object. If factorLevels can be determined from the data for any reason 2-4 above, then we grab the factor levels and plop them into factorLevels. That way the attribute is always populated for factor nodes.

Then, when we get to compileJagsModel and compileDecisionModel, instead of all of the hoops I have to jump through to create factorRef, I can just grab all of those non-NULL attributes.

Lastly, factorLevels will need to be a data frame, with one column for the levels and one for the labels, exactly as the appear in factorRef now. Should be a fairly seamless transition.

If you're good with that, I'll write this in on Friday morning.

This all sounds good to me. One thing to consider, though, is whether or not it me that be possible to simply give a character vector of labels instead of a data frame.

Jarrod

On Sep 23, 2015, at 9:01 PM, Benjamin notifications@github.com wrote:

For a more robust, and, coincidentally, less computationally intensive approach, I will do this:

Include in setNode a factorLevels argument. This argument will only take effect under the following conditions:

The nodeType is dbern or dcat The nodeFitter is not cpt (I'm still contemplating the downstream consequences of this, but if the nodeFitter is cpt, then the factor levels should be extracted from the cpt object). The nodeData object is NULL The data element does not contain a variable named for the node. That's a long way of saying that we give preference to the level factors existing in the data before we will accept it from the user. I think this is important because if we allow differing levels from the data, the pieces of the puzzle won't fit any longer.

Once we've passed all those conditions, we'll use the user supplied factor levels in the factorLevels attribute of the network object. If factorLevels can be determined from the data for any reason 2-4 above, then we grab the factor levels and plop them into factorLevels. That way the attribute is always populated for factor nodes.

Then, when we get to compileJagsModel and compileDecisionModel, instead of all of the hoops I have to jump through to create factorRef, I can just grab all of those non-NULL attributes.

Lastly, factorLevels will need to be a data frame, with one column for the levels and one for the labels, exactly as the appear in factorRef now. Should be a fairly seamless transition.

If you're good with that, I'll write this in on Friday morning.

— Reply to this email directly or view it on GitHub.

That's what I had in mind. Sorry to have obscured that. The numeric levels will be assigned as 1:n where n is the length of the vector given and the order is the same as given in the vector.

Seems we're getting there. I noticed, though, that we still need to specify numerical codes when we're doing something like trying to specify a logistic regression equation. See the below code, specifically where I define the logistic regression equation for node CAD. rjags barks at me for trying to use factor levels. (Which makes sense because the equation is not being 'validated' in setNode()).

> setwd("c:/Users/daltonj/Google Drive/BayesNetLab/Decision 1 - CAD")
> 
> library(HydeNet)
> 
> g <- HydeNetwork(~ CAD | Age * Sex * ChestPainType
+                  + stressTestResult | do.stressTest * CAD
+                  + ekgResult        | do.ekg * CAD
+                  + echoResult       | do.echo * CAD
+                  + ctaResult        | do.cta * CAD
+                  + testCosts | do.stressTest * do.ekg
+                                * do.echo * do.cta)
> 
> g <- setDecisionNodes(g, "do.stressTest", "do.ekg", "do.echo", "do.cta")
> g <- setUtilityNodes(g,  "testCosts")
> 
> plot(g)
> 
> #Priors on age, sex, and type of chest pain
> 
> g <- setNode(g, node = Age, nodeType = "dnorm", mu = 60, tau = 1/(10^2))
> 
> g <- setNode(g, node = Sex, nodeType = "dbern",
+              p = 0.65, factorLevels = c("Female","Male"))
> 
> g <- setNode(g, node = ChestPainType, nodeType = "dcat",
+              pi = vectorProbs(p = c(0.30, 0.25, 0.45),
+                               ChestPainType),
+              factorLevels = c("Typical","Atypical","Nonspecific"),
+              validate = FALSE)
> 
> #Priors on decision nodes
> #  -- These will be set for all simulations so the probability values
> #     are arbitrary. However, the distribution of observed decisions
> #     among some population of actors can be modeled.
> 
> g <- setNode(g, node=do.stressTest, "dbern", p = 0.50,
+              factorLevels = c("No","Yes"))
> 
> g <- setNode(g, node=do.ekg, "dbern", p = 0.50,
+              factorLevels = c("No","Yes"))
> 
> g <- setNode(g, node=do.echo, "dbern", p = 0.50,
+              factorLevels = c("No","Yes"))
> 
> g <- setNode(g, node=do.cta, "dbern", p = 0.50,
+              factorLevels = c("No","Yes"))
> 
> #Genders et al. pre-test risk equation for CAD:
> RiskEqn = paste("-4.37 + 0.04*Age + 1.34 * (Sex == 'Male')",
+                 "+ 1.91*(ChestPainType == 'Typical')",
+                 "+ 0.64*(ChestPainType == 'Atypical')")
> 
> g <- setNode(g, node = CAD, nodeType = "dbern",
+              p=paste("ilogit(", RiskEqn, ")"))

setNode(g, node = CAD, nodeType = "dbern", p = paste("ilogit(", 
    RiskEqn, ")"))
1: Validation has been ignored for parameters defined with character strings
> 
> #Input conditional probability table (CPT) for test result nodes.
> #The process is demonstrated below for node 'ekgResult'.
> #
> # ekgResult.cpt <- 
> #   inputCPT(ekgResult ~ do.ekg*CAD,
> #            factorLevels=list(ekgResult = c("Not Done","Negative",
> #                                            "Positive"),
> #                              do.ekg    = c("No","Yes"),
> #                              CAD       = c("No","Yes")))
> # -------------------------------------------------------------------
> # NOTE: parameter 'reduce' is set to TRUE in inputCPT().
> # Conditional probabilities Pr(ekgResult=Not Done | do.ekg, CAD) will  
> # be calculated as the complement of the inputted probabilities
> # Pr(ekgResult != Not Done | do.ekg, CAD).
> # -------------------------------------------------------------------
> # Enter the following conditional probabilities:
> # Use '<q>' to halt execution.
> # To go back one step and re-enter, enter '<b>'.
> # -------------------------------------------------------------------
> # Pr(ekgResult=Negative | do.ekg=No , CAD=No ):   0
> # Pr(ekgResult=Positive | do.ekg=No , CAD=No ):   0
> # Pr(ekgResult=Negative | do.ekg=Yes, CAD=No ):   .66
> # Pr(ekgResult=Positive | do.ekg=Yes, CAD=No ):   .34
> # Pr(ekgResult=Negative | do.ekg=No , CAD=Yes):   0
> # Pr(ekgResult=Positive | do.ekg=No , CAD=Yes):   0
> # Pr(ekgResult=Negative | do.ekg=Yes, CAD=Yes):   .48
> # Pr(ekgResult=Positive | do.ekg=Yes, CAD=Yes):   .52
> #
> #save(ekgResult.cpt, file="ekgResult.cpt.RData")
> load(file="ekgResult.cpt.RData")
> g <- setNodeModels(g, ekgResult.cpt)
> 
> #do the same for the other test result nodes
> load(file="echoResult.cpt.RData")
> load(file="ctaResult.cpt.RData")
> load(file="stressTestResult.cpt.RData")
> 
> g <- setNodeModels(g, echoResult.cpt)
> g <- setNodeModels(g, ctaResult.cpt)
> g <- setNodeModels(g, stressTestResult.cpt)
> 
> #populate deterministic utility node 'testCosts'
> g <- setNode(g, node = testCosts, nodeType = "determ",
+              define=fromFormula(),
+              nodeFormula = testCosts ~  200*(do.stressTest == 2)
+                                      +   50*(do.ekg == 2)
+                                      + 1800*(do.echo == 2)
+                                      +  900*(do.cta == 2))
> 
> writeNetworkModel(g, pretty=TRUE)
model{
   CAD ~ dbern(ilogit( -4.37 + 0.04*Age + 1.34 * (Sex == 'Male') + 1.91*(ChestPainType == 'Typical') + 0.64*(ChestPainType == 'Atypical') ))
   Age ~ dnorm(60, 0.01)
   Sex ~ dbern(0.65)
   pi.ChestPainType[1] <- 0.3; pi.ChestPainType[2] <- 0.25; pi.ChestPainType[3] <- 0.45
ChestPainType ~ dcat(pi.ChestPainType)
   pi.stressTestResult <- cpt.stressTestResult[(do.stressTest+1), (CAD+1), ]
   stressTestResult ~ dcat(pi.stressTestResult)
   do.stressTest ~ dbern(0.5)
   pi.ekgResult <- cpt.ekgResult[(do.ekg+1), (CAD+1), ]
   ekgResult ~ dcat(pi.ekgResult)
   do.ekg ~ dbern(0.5)
   pi.echoResult <- cpt.echoResult[(do.echo+1), (CAD+1), ]
   echoResult ~ dcat(pi.echoResult)
   do.echo ~ dbern(0.5)
   pi.ctaResult <- cpt.ctaResult[(do.cta+1), (CAD+1), ]
   ctaResult ~ dcat(pi.ctaResult)
   do.cta ~ dbern(0.5)
   testCosts <- 200 * (do.stressTest == 2) + 50 * (do.ekg == 2) + 1800 * (do.echo == 2) + 900 * (do.cta == 2)
}
> 
> cg <- compileDecisionModel(g)

Error in rjags::jags.model(textConnection(writeNetworkModel(network)),  : 

Error parsing model file:
syntax error on line 2 near "'"

I tried an alternative formulation that did utilize the formula validation procedure, and it gave the same problem:

... (etc....) ...

g <- setNode(g, node = CAD, nodeType = "dbern", p = fromFormula(),
             nodeFormula = CAD ~ ilogit(-4.37 + 0.04*Age + 1.34 * (Sex == 'Male')
                                        + 1.91*(ChestPainType == 'Typical')
                                        + 0.64*(ChestPainType == 'Atypical')
                                        ))

... (etc....) ...

> writeNetworkModel(g, pretty=TRUE)

model{
   CAD ~ dbern(ilogit(-4.37 + 0.04 * Age + 1.34 * (Sex == "Male") + 1.91 * (ChestPainType == "Typical") + 0.64 * (ChestPainType == "Atypical")))
   Age ~ dnorm(60, 0.01)
   Sex ~ dbern(0.65)
   pi.ChestPainType[1] <- 0.3; pi.ChestPainType[2] <- 0.25; pi.ChestPainType[3] <- 0.45
ChestPainType ~ dcat(pi.ChestPainType)
   pi.stressTestResult <- cpt.stressTestResult[(do.stressTest+1), (CAD+1), ]
   stressTestResult ~ dcat(pi.stressTestResult)
   do.stressTest ~ dbern(0.5)
   pi.ekgResult <- cpt.ekgResult[(do.ekg+1), (CAD+1), ]
   ekgResult ~ dcat(pi.ekgResult)
   do.ekg ~ dbern(0.5)
   pi.echoResult <- cpt.echoResult[(do.echo+1), (CAD+1), ]
   echoResult ~ dcat(pi.echoResult)
   do.echo ~ dbern(0.5)
   pi.ctaResult <- cpt.ctaResult[(do.cta+1), (CAD+1), ]
   ctaResult ~ dcat(pi.ctaResult)
   do.cta ~ dbern(0.5)
   testCosts <- 200 * (do.stressTest == 2) + 50 * (do.ekg == 2) + 1800 * (do.echo == 2) + 900 * (do.cta == 2)
}

> 
> cg <- compileDecisionModel(g)

Error in rjags::jags.model(textConnection(writeNetworkModel(network)),  : 

Error parsing model file:
syntax error on line 2 near """

Replacing the codes with numbers works as expected, though:

... (etc....) ...

g <- setNode(g, node = CAD, nodeType = "dbern", p = fromFormula(),
             nodeFormula = CAD ~ ilogit(-4.37 + 0.04*Age + 1.34 * (Sex == 2)
                                        + 1.91*(ChestPainType == 1)
                                        + 0.64*(ChestPainType == 2)
                                        ))

... (etc....) ...

And another issue re: factor levels (with the same network as in the above):

patient <- list(Age=71, Sex="Male", ChestPainType="Atypical")
cg0 <- compileJagsModel(g, data = patient)  #this works fine

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 123

Initializing model

patient <- c(patient, list(do.ekg="Yes", ekgResult="Positive"))
cg1 <- compileJagsModel(g, data = patient)  #not so much...

Error in convertLabelToValue(data, factorRef) : 
Values observed in 'ekgResult' must be one of 1, 2, 1, 2

The first of the previous two comments should be made into a separate issue as a feature request. I'll say up front that I'm at a loss of how to approach it. The problem is that those character strings get passed straight to JAGS, so they have to be JAGS-ready code. This is stated in the documentation for setNode (see the Validation section). Not saying it isn't an idea worth pursuing, just that there's no simple way to translate those concepts back and forth from a simple character string.

The second issue is serious and is one I will get on right away.

Repeat after me:

stringsAsFactors = FALSE
stringsAsFactors = FALSE
stringsAsFactors = FALSE
stringsAsFactors = FALSE
stringsAsFactors = FALSE
stringsAsFactors = FALSE

I see that in your reconstruction of the makeFactorRef() utility function, you hard-code an adjustment for the dbern nodes so that their codes are 0:1 while multilevel factors have codes of 1:k. The alternative would be to uniformly code 1:k, even for binary variables.

Which is better? I think that in either case we need to clearly document the coding convention within the setNode() documentation and within the vignettes. Therefore, I suppose it doesn't matter.

Given that HydePosterior() conveniently produces posterior matrices with factor labels, the advantage of the 0:1 approach - namely, being able to take the mean of a binary variable column to obtain its incidence - is out the window. So I'd be inclined to be consistent and use codes of 1:k in all cases. However, this is strictly a style issue and like I said, as long as we document well, it doesn't matter.

It's actually hard coded at 0:1, because JAGS requires it. dbern takes a parameter, p, on the interval (0, 1), and returns a value of either 0 or 1. If you want to observe a bernoulli random variable, it must be observed as either 0 or 1 (ie, data = list(bern.var = 0)). dcat, on the other hand, returns an integer from 1 to k. But I can work on making this documentation more clear.

nutterb / HydeNet

Factor levels for nodes not learned from data #81