Open rdiaz02 opened 2 years ago
It seems like both graphs (the given one and the one without edges from D to B and C to A) have the same likelihood. And both graphs assign the same probability to each genotype since C itself requires B.
It seems like the algorithm picks the more complicated graph when there is a tie in the likelihood. I agree this makes the result harder to interpret. I think this should be a relatively easy fix, and I will update when I can get on a computer tomorrow.
Actually the two models have slightly different likelihoods... Under the graph printed above we have
P(B = 0 | C = 1, D = 0) = 1 - epsilon
Because both parents C and D are not equal to 1. If we remove the the edge from D to B this probability becomes
P(B = 0 | C = 1, D = 0) = 1 - theta_b
because now C is the only parent of D.
In the data, there are at least 10 observations with (B = 0, C = 1, D = 0) (see d1[59,]
). Because epsilon is small, 1 - epsilon will tend to be larger than 1 - theta_b which favors the model that includes an edge from D to B.
However, when removing D to B the log likelihood only decreases by 2. It might make sense to include an AIC/BIC type penalty on the number of edges to avoid situations like this.
Thanks for the detailed analysis! If I understand correctly, there are two issues: a) model choice itself, which might resolve things in favor of the smaller, transitively reduced models when using AIC/BIC; b) interpretation. Regarding b) the difference with respect to CBN (Gerstung et al., 2009, for instance) arises because of epsilon in OncoBN. This makes a lot sense now.
With the attached data I ran OncoBN as follows:
and was surprised to find a DAG that is not transitively reduced. This is easy to see here:
(attached too)
I would have expected not to see:
I am not sure how to interpret the output. And I think CBN itself (Gerstung et al., 2009, for example) does not return DAGs that are not transitively reduced.
What am I missing? d1.txt