nutterb / HydeNet

Hybrid Decision Networks in R
Other
23 stars 3 forks source link

Error when data is tibble #105

Closed derekpowell closed 6 years ago

derekpowell commented 6 years ago

Really excited about HydeNet and hoping to use it for an upcoming project. Just caught what appears to be a bug in handling data passed in as a tibble (from tidyverse).

Here's a (relatively) minimal reproducible example:

library(HydeNet)
library(dplyr)
library(tibble)

sim_df <- data.frame(A = rnorm(1000, 0, 3)) %>%
  mutate(B = rnorm(1000, 0, 2)) %>%
  mutate(C = .5 * A + .25 * B + rnorm(1000, 0, 1)) %>%
  mutate(D = .8 * B + -.35 * C + rnorm(1000, 0, 2)) %>%
  mutate(E = .33 * B + rnorm(1000, 0, 3))

sim_tibble <- as.tibble(sim_df)

hyde_model <- "~ C|A*B + D|C*B + E|D"
net.Hyde1 <- HydeNetwork(as.formula(hyde_model), 
                        data = sim_df)
compNet1 <- compileJagsModel(net.Hyde1)

net.Hyde2 <- HydeNetwork(as.formula(hyde_model), 
                        data = sim_tibble)
compNet2 <- compileJagsModel(net.Hyde2)

On my system that code fails on the last line with error: Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

Looking at the net.Hyde2 object, it looks like the variables are being treated as categorical and/or binomial when they should be treated as normal:

A Probabilistic Graphical Network
Has data attached: Yes

C | A * B
dnorm(mu = fromData, tau = fromData)
glm: C ~ A + B

A
dcat(pi = fromData)
glm: ~A

B
dcat(pi = fromData)
glm: ~B

D | C * B
dnorm(mu = fromData, tau = fromData)
glm: D ~ C + B

E | D
dnorm(mu = fromData, tau = fromData)
glm: E ~ D

My system version info:

nutterb commented 6 years ago

I will address this hopefully this coming Friday. My next CRAN release will likely be in July, but you'll have a patch from this repository before then.

derekpowell commented 6 years ago

great, thanks!

nutterb commented 6 years ago

Fixed. Reinstall with devtools;:install_github("nutterb/HydeNet", ref = "current-devel")

Gory details:

There were parts of the code that used data[, col_name]. For the data.frame class, this will return a vector if col_name is of length 1. The same call for tibbles, however, returns another tibble. I replaced all these instances with data[[col_name]] to force the return of the single column where it was wanted.