nutterb / HydeNet

Hybrid Decision Networks in R
Other
23 stars 3 forks source link

Deterministic nodes with factor parents #65

Open jarrod-dalton opened 9 years ago

jarrod-dalton commented 9 years ago

This can be difficult, since we have to write formulas which reference factor levels. Currently, we have to do so by referencing the integer index of the factor level we want.

Rather than fumble around trying to give you a closed-form example of R code that tries to describe what I'm talking about, I will invite you to go into the Decision Networks vignette and attempt defining the payoff utility node using setNode(). Or, if that's actually not possible, modifying the network structure (e.g., by adding nodes) in such a way that the payoff can be calculated.

In the meantime, I'll switch to working on the Setting Nodes and Getting Started vignettes.

nutterb commented 9 years ago

This is another one I'll spend some time thinking about. I'll come up with something.

nutterb commented 8 years ago

Consider this example:

net <- setNode(net, payoff, "determ", define=fromFormula(),
         nodeFormula = payoff ~
                         ifelse(playerFinalPoints > 21, -1,
                           ifelse(playerFinalPoints == 21,
                             ifelse(dealerOutcome == 1, 0,
                               ifelse(dealerOutcome == 7, 0, 1)),
                             ifelse(dealerOutcome == 2,
                               ifelse(playerFinalPoints < 22, 1, -1),
                               ifelse(dealerOutcome == 3,
                                 ifelse(playerFinalPoints == 17, 0,
                                 ifelse(playerFinalPoints > 17, 1, -1)),
                                 ifelse(dealerOutcome == 4,
                                   ifelse(playerFinalPoints == 18, 0,
                                     ifelse(playerFinalPoints > 18, 1, -1)),
                                   ifelse(dealerOutcome == 5,
                                     ifelse(playerFinalPoints == 19, 0,
                                       ifelse(playerFinalPoints > 19, 1, -1)),
                                     ifelse(dealerOutcome == 6,
                                       ifelse(playerFinalPoints == 20, 0,
                                         ifelse(playerFinalPoints > 20, 1, -1)),
                                       ifelse(playerFinalPoints == 21, 0, -1)))))))))

Given the current structure, the only thing I can think that would make it feasible to give the factor level would be to use a utility function here. So if I wanted the equivalent of dealerOutcome == 2, I could use a utility such as

dealerOutcome == numericLevel("Bust", BJDealer$dealerOutcome)

the numericLevel function would then return the number 2.

The upside is that you don't have to remember all of the variable codings. The downside is that it has the potential to be much more typing. But the only other place this gets processed is in writing the JAGS code, and there's no good way to tie the numeric coding to a factor variable at that point.

I'll write up the function. You can tell me if you want to use it at all. :)

jarrod-dalton commented 8 years ago

Is there maybe an escape character that we could use instead of quotation marks, like

Bust

which would tell HydeNet to call such a function?

On Fri, Oct 30, 2015 at 11:08 AM, Benjamin notifications@github.com wrote:

Consider this example:

net <- setNode(net, payoff, "determ", define=fromFormula(), nodeFormula = payoff ~ ifelse(playerFinalPoints > 21, -1, ifelse(playerFinalPoints == 21, ifelse(dealerOutcome == 1, 0, ifelse(dealerOutcome == 7, 0, 1)), ifelse(dealerOutcome == 2, ifelse(playerFinalPoints < 22, 1, -1), ifelse(dealerOutcome == 3, ifelse(playerFinalPoints == 17, 0, ifelse(playerFinalPoints > 17, 1, -1)), ifelse(dealerOutcome == 4, ifelse(playerFinalPoints == 18, 0, ifelse(playerFinalPoints > 18, 1, -1)), ifelse(dealerOutcome == 5, ifelse(playerFinalPoints == 19, 0, ifelse(playerFinalPoints > 19, 1, -1)), ifelse(dealerOutcome == 6, ifelse(playerFinalPoints == 20, 0, ifelse(playerFinalPoints > 20, 1, -1)), ifelse(playerFinalPoints == 21, 0, -1)))))))))

Given the current structure, the only thing I can think that would make it feasible to give the factor level would be to use a utility function here. So if I wanted the equivalent of dealerOutcome == 2, I could use a utility such as

dealerOutcome == numericLevel("Bust", BJDealer$dealerOutcome)

the numericLevel function would then return the number 2.

The upside is that you don't have to remember all of the variable codings. The downside is that it has the potential to be much more typing. But the only other place this gets processed is in writing the JAGS code, and there's no good way to tie the numeric coding to a factor variable at that point.

— Reply to this email directly or view it on GitHub https://github.com/nutterb/HydeNet/issues/65#issuecomment-152550477.

nutterb commented 8 years ago

It's possible we could use something like dealerOutcome == "#Bust,BJDealer$dealerOutcome#", but I don't think that saves much typing. The major issue is that therToJags` function deals with converting R code into JAGS and only takes a single argument--a formula object. The variable has to be passed with the variable level.

However, as I think about it, we could create our own handy dandy little intermediary function with a weird syntax. for example:

jagsFunc(formula, ...)

where the ... arguments takes named arguments, each giving a factor variable referenced in formula.

jags(payoff ~ dealerOutcome == "#Bust:dealerOutcome#",
     dealerOutcome = BJDealer$dealerOutcome)

returns a formula object payoff ~ dealerOutcome == 2.

Alternatively, we might have jagsFunc take a character argument, which would allow jagsFunc("payoff ~ #dealerOutcome == 'Bust'#"). I'm a little nervous about this one, however, because I think it will likely fail if someone tries to use it in any way other than the == sense. I can't think of why anyone would do something like dealerOutcome * "Bust" or what that would mean. Perhaps I'm being paranoid?

I'm rambling. that might work actually.

nutterb commented 8 years ago

This is now implemented into the current-devel branch. The final function name is factorFormula and I even implemented it in the Decision Networks vignette if you'd like to see it in action. If you feel like you can get behind this, let me know.

jarrod-dalton commented 8 years ago

Beautiful. Now on to the beggar->chooser transition: is it possible to build logic into the formula evaluation such that if it sees any quoted elements it knows to pass it through factorFormula() without the user explicitly calling it?

On Oct 30, 2015, at 2:16 PM, Benjamin notifications@github.com wrote:

This is now implemented into the current-devel branch. The final function name is factorFormula and I even implemented it in the Decision Networks vignette if you'd like to see it in action. If you feel like you can get behind this, let me know.

— Reply to this email directly or view it on GitHub.

nutterb commented 8 years ago

Truthfully, yes. It just means passing every formula through factorFormula within setNode and not exporting factorFormula. (well, we could still export it, we just wouldn't have to, and I would probably opt not to, since there isn't much need for it otherwise). would you like to beg and choose that option?

jarrod-dalton commented 8 years ago

I like that. All under the hood. Thanks!

On Fri, Oct 30, 2015 at 2:36 PM, Benjamin notifications@github.com wrote:

Truthfully, yes. It just means passing every formula through factorFormula within setNode and not exporting factorFormula. (well, we could still export it, we just wouldn't have to, and I would probably opt not to, since there isn't much need for it otherwise). would you like to beg and choose that option?

— Reply to this email directly or view it on GitHub https://github.com/nutterb/HydeNet/issues/65#issuecomment-152612015.

jarrod-dalton commented 8 years ago

I seem to be unable to pass node formulas through factorFormula() when the node is not deterministic. Below, I attempt to manually write a logistic regression equation for pe given wells, where wells is treated as a three-level categorical variable.

# Set up some stuff...
net <- HydeNetwork(~ wells
                   + pe | wells
                   + d.dimer | pregnant*pe
                   + angio | pe
                   + treat | d.dimer*angio
                   + death | pe*treat)

net <- setNode(net, wells,
               nodeType = "dcat",
               pi = vectorProbs(p = c(37, 164, 49), wells),
               factorLevels = c("Low","Medium","High"))
# These two attempts do not work...
net <- setNode(net, "pe", nodeType = "dbern", 
               define = fromFormula(),
               nodeFormula = pe ~ ilogit(-2.94
                                         + 1.56*(wells == "Medium")
                                         + 3.14*(wells == "High")))  

net <- setNode(net, "pe", nodeType = "dbern", 
               p = plogis(-2.94 + 1.56*(wells == "Medium")
                          + 3.14*(wells == "High")))
jarrod-dalton commented 8 years ago

I think I got it...

net <- setNode(net, "pe", nodeType = "dbern", 
                p = fromFormula(),
                nodeFormula = pe ~ ilogit(-2.94
                                          + 1.56*(wells == "Medium")
                                          + 3.14*(wells == "High")))  
jarrod-dalton commented 8 years ago

Do we want to alert the user to unconverted factor levels? In the below example, we try to use a factor level for node pe in the regression equation for d.dimer before we've used setNode() to define node pe (and told it that the factorLevels are c("No","Yes").

It is generally a good idea to proceed through the network in topological order (basically starting from the root nodes and populating children only when all parent nodes have been populated). Doing so will avoid issues like this.

Do we want to go so far as disallowing setNode() from working if all parents' models have not yet been specified? This wouldn't catch all possible ways to screw up inputting node distributions via setNode() (as I seem to be adept at demonstrating), but on the other hand I can't seem to think of a good reason not to work under this restriction.

net <- HydeNetwork(~ wells
                   + pe | wells
                   + d.dimer | pregnant*pe)

net <- setNode(network = net, node = pregnant,
               nodeType = "dbern", p=.4,
               factorLevels = c("No","Yes"))

wells.p <- paste("pi.wells[1] <- 0.148",
                 "pi.wells[2] <- 0.656",
                 "pi.wells[3] <- 0.196",
                 sep = "; ")
net <- setNode(net, wells, nodeType = "dcat", pi = wells.p)

# Not run, but it should be...
#
#net <- setNode(net, "pe", nodeType = "dbern", 
#                p = fromFormula(),
#                nodeFormula = pe ~ ilogit(-2.94
#                                          + 1.56*(wells == "Medium")
#                                          + 3.14*(wells == "High")))

net <- setNode(net, d.dimer, nodeType="dnorm",
               mu=fromFormula(), tau=1/30,  #sigma^2 = 30
               nodeFormula = d.dimer ~ 210 + 29*(pregnant=="Yes") + 68*(pe=="Yes"))

net$nodeFormula$d.dimer

d.dimer ~ 210 + 29 * (pregnant == 1) + 68 * (pe == character(0))
<environment: 0x10615f5c0>
nutterb commented 8 years ago

I added an error in circumstances where there is no accompanying factorLevels entry for the variable. I think it's important to make this a hard error--the downstream consequences are catastrophic. Let me know if you think the error message is sufficient or if it needs more information.