Closed ktroule closed 5 years ago
Hi Kevin,
I am not able to reproduce the error with some example data.
a <- matrix(rep(0:19/10,4), nrow=4)
d <- BNDataset(data=t(a), variables=c("a", "b", "c", "d"), discreteness = rep('c', 4), node.sizes = rep(2,4), starts.from=0)
net <- learn.network(d)
The learning is successful. The for loop with the cut.default works as in your case.
Do you have the same error without the starts.from parameter?
Because the results can differ:
> for(i in 1:4){ q <- cut.default(t(a)[,i], 2, labels = FALSE, include.lowest = TRUE); print(q) }
[1] 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2
[1] 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2
[1] 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2
[1] 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2
> for(i in 1:4){ q <- cut.default(t(a+1)[,i], 2, labels = FALSE, include.lowest = TRUE); print(q) }
[1] 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2
[1] 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2
[1] 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2
[1] 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2
The second case is equivalent to using starts.from, there are some differences in the output, so it might be that the distribution of your data suffers from the transformation (e.g. one variable has values only below 1).
Yes, indeed, severa of my variables have values lower than 1. I've added a 1 to each variable, and remove starts.from and still get the same error. Just in case, my variables do not follow a normal distribution, and the dataset is quite sparse, about 17% of values are 0.
Sorry, maybe I formulated it unclear :)
Adding 1 to each variable is equivalent to using starts.from=0
, which is intended for already discrete data. The for loop you tried in your first post was applying the cut.default
to the original data, while the learning was done using the transformed data, so they were effectively two different datasets. Hence, it is possible that in one case (without the additional 1 or starts.from
) you get 1 2
for every variable, and in the other one (the one you were using to learn the network) you get an error. So you should try first by learning without starts.from
, and without the additional 1.
BN <- BNDataset(data=t(scores.tumor.brca), discreteness = rep('c', 67),
variables = row.names(scores.tumor.brca), node.sizes = rep(2,67))
net <- learn.network(BN)
If it fails again, you can try to discretize the data manually and learn from these now discrete distributions, e.g.
newdata <- scores.tumor.brca
cutpoint <- 1
newdata[newdata <= cutpoint] <- 1
newdata[newdata > cutpoint] <- 2
BN <- BNDataset(data=t(newdata), discreteness = rep('d', 67), variables = row.names(newdata), node.sizes = rep(2,67))
net <- learn.network(BN)
or for any suitable value for cutpoint
depending on your data (indeed, the distribution of the values is the most likely issue here).
Sorry, I did explained myself wrongly. I did try to not add the 1 and keep starts.from
as default, but it keeps failing.
Regarding to the discretization its something I'm not sure, as the values do not follow a normal distribution, I'm not sure how discretizing values not taking into account this can affect to the final network.
Thank
Other question.
Regarding the section you talk about the Dynamic Bayesian Networks.
The layering option is used for the variables (nodes), but is there any way to tell the function which samples belong to which time? For instance, having samples at time 0, and samples at time 1.
Thanks
If you mean a situation like
V1 V2 V3
1 1 2 (time 1)
2 2 2 (time 2)
2 2 1 (time 1)
2 3 3 (time 2)
where you have a layering e.g. 1 1 2
for the first instant, you have to put the observations together, by concatenating the different instants:
V1t1 V2t1 V3t1 V1t2 V2t2 V3t2
1 1 2 2 2 2
2 2 1 2 3 3
and tell bnstruct that you have 2 instants using num.time.steps=2
. This way the layering will be propagated throughout the other instants and become 1 1 2 3 3 4
.
If you have instead a situation like
V1 V2 V3
not observed (time 1)
2 2 2 (time 2)
2 2 1 (time 1)
not observed (time 2)
since bnstruct assumes coherence between observations you'll have to revert to missing data, e.g.
V1t1 V2t1 V3t1 V1t2 V2t2 V3t2
? ? ? 2 2 2
2 2 1 ? ? ?
Thanks for the fast response.
Two silly questions.
First, the heder of the matrix can be anything that I want or must be VarName + t0, t1 ... tn Second, is it mandatory to tell the function the the layering? In my case I have no information of which nodes should be parents and childrens (which as far as I've read in the manual is what you do when layering).
O Once more, thanks.
In the meantime
bnset.brca <- BNDataset(scores.mix.discret, discreteness = rep('d', 68), variables = colnames(scores.mix.discret),
node.sizes = rep(3,68), starts.from=1, num.time.steps=2)
bnset.brca.boots <- bootstrap(bnset.brca, 100)
bnset.brca.boots.learn <- learn.dynamic.network(bnset.brca, bootstrap = TRUE, scoring.func = 'BDeu') I get this, when trying to run the last line
Error in .local(x, ...) :
promise already under evaluation: recursive default argument reference or earlier problems?
1) That was just a way to make the example more clear. If you want to give a name for all the variables in all the instants, you can name them as you want. The easier way is however to specify only the variable names for the "real" variables (e.g. in one single instant, after all, in the other instants the variables are the same, just observed at a different time), and BNDataset()
will add a _t1
for the first instant, _t2
for the second one, and so on. (I realize now in the previous example I made the time start at 0, sorry, instead is 1
as per R indexing, I'll fix above)
2) No, the layering can be omitted. In this case, the layering considered will be the natural one given by the time sequence (a node cannot have a parent in the future).
3) I have honestly never seen that. You have to use
bnset.brca.boots.learn <- learn.dynamic.network(bnset.brca.boots, bootstrap = TRUE, scoring.func = 'BDeu')
since in R when you apply a method you still have to assign the output to something, so bnset.brca
does not have bootstrap samples. You should get an error message from bnstruct, anyway... I'll take a look when I can.
Hi all,
I encountered the same issue. The problem can be fixed by modifying line 114 of utils.R from
quantiles <- quantile( data[,i], probs = (0:levels[i])/levels[i], na.rm = TRUE )
to
quantiles <- unique(quantile( data[,i], probs = (0:levels[i])/levels[i], na.rm = TRUE ))
This won't change any behaviour in datasets where the algorithm runs, but it will prevent this error in future.
I've submitted a pull request implementing only this change.
Cheers,
Amy
Hi Amy,
thanks a lot! I merged the change.
Alberto
Dear Amy,
I experience the same problem. Thus, how to modify line 114? Within "learn.network" function or "bnstruct" package? Is a script available for this purpose?
Tks a lot.
Best,
Apparently Alberto has updated the CRAN package with the fix from Amy last friday (31.08.2018). If you have installed the package prior to that date, reinstall or install it from GIT via the devtools package in R. That fixed my problem.
Hi I'm playing with bnstruct, and I'm facing a problem that I'm not been able to solve.
I'm using a matrix with continuous data, that has values from 0 to ~1.9. The problem is related to the discretization of the values.
Output
I've checked whether the function actually is able to discretize in the 2 classes, and all 67 variables can be discretized in two classes.
The output for all 67 variables is always
1 2
.What I'm missing from this?
Thanks fro your time.