Network learning - Githubissues

svenhofman commented 2 years ago

Hello,

Thank you for the package, it is really helpful. Pardon me if the following question is not appropriate in this repo, but I have a question about the obtaining optimal DAG corresponding to a data set with learn.network(). When I generate a small network like the following:

library(Rlab)
m = 2000000
X1 = rbern(m, prob = 0.5)
X2 = X1 + rbern(m, prob = 0.5)
X3 = X2 + rbern(m, prob = 0.5)
X4 = X3 + rbern(m, prob = 0.5)

which corresponds to the network

X1 -> X2 -> X3 -> X4

and I want to find the network structure of the data with the highest BDeu score, I do the following:

D = D + 1
Dbn = BNDataset(D, discreteness = rep('d',4), variables = c("1", "2", "3", "4"), node.sizes = c(2,3,4,5))
net = learn.network(Dbn, scoring.func = "BDeu", algo = "sm", ess = 10)
print(net)

This gives me the following adjacency matrix, which is not equal to the 'true' network:

Adjacency matrix:
  1 2 3 4
1 0 0 0 0
2 1 0 1 0
3 1 0 0 0
4 0 1 1 0

The BDeu score corresponding to this score is indeed lower than the BDeu score of the true network, so I would guess that that is the reason why learn.network() returns this DAG. However, I don't understand why even with 2000000 data samples, the best network does not correspond to the true network. If I would generate data for the same network as follows:

library(Rlab)
m = 2000000
for (i in 1:m) {
  X1[i] = rbern(1, prob = 0.5)

  if (X1[i] == 1) p = 0.7 else p = 0.3
  X2[i] = rbern(1, prob = p)

  if (X2[i] == 1) p = 0.7 else p = 0.3
  X3[i] = rbern(1, prob = p)

  if (X3[i] == 1) p = 0.7 else p = 0.3
  X4[i] = rbern(1, prob = p)
}

So we generate data using the same network structure, but all data is boolean, then the following function call:

D = D + 1
Dbn = BNDataset(D, discreteness = rep('d',4), variables = c("1", "2", "3", "4"), node.sizes = c(2,2,2,2))
net = learn.network(Dbn, scoring.func = "BDeu", algo = "sm", ess = 10)
print(net)

does indeed return the true DAG as the learned network. Do you have any idea why for a lot of observation, the learned network is still not equal to the true network when the data is generated using the first approach?

Thank you.

albertofranzin commented 2 years ago

Hi,

it's a very good question. I don't have a real answer, but it's anyway related to how the scoring function works. If you try AIC or BIC instead of BDeu, you'll see a different network.

Also, if you use an equivalent sample size of 1 you'll get the same network of the second case. Unfortunately, I don't have the theoretical background to elaborate much further, nor I have the time to delve more into the literature, but this may be a starting point for you to look into it.

It has to be noted that the network is not the correct one, but the network with its arcs reversed. Anyway, this is the best you can get with the score, because it encodes the same conditional independence relationships, which cannot be disambiguated by a scoring function.

Hope this helps at least a bit.

svenhofman commented 2 years ago

That's interesting. Because of your comment, I found a paper actually discussing this problem of the ESS on the optimal network structure. This is what I was looking for, many thanks!

sambofra / bnstruct

Network learning #28