Open rdiaz02 opened 2 years ago
Sorry for the late reply, but I had a chance to look at d333.txt
. For the CBN model, we see that the estimated theta for mutation C is 1. Because A and B are the parents of C this means that the p(C | A and B) (probability of mutation C given both A and B) should be 1. Indeed, I see that sum(d333$A & d333$B)
and sum(d333$A & d333$B & d333$C)
are both equal to 1443
.
For mutation E, the parent is C. So a theta equal to 0 should mean that p(E | C) = 0. We have sum(d333$C & d333$E) = 0
. Any time that E = 1 the model considers it a "spontaneous activation" (which occurs with probability epsilon
). When I change epsilon = 0
(0 tolerance for mutations not following the DAG rule) I see that E is a child of the wild type (which is expected). I am going to think a little bit more about whether or not there may be a better way to automatically choose the epsilon
.
As you noticed I corrected a bug in the latest update, so that would be the reason for the observed changes. I still need to look at d222.txt
, so let's keep this issue and I will get back to you shortly. Thanks!
For the data set attached (d333.txt), this is what I get with commit aa039d3:
If I run this with commit 3276ad0, in addition to changes in the graph, some thetas are identical to 1.0 or 0.0.
Note that there are observations for the gene with theta = 0:
I can compute with thetas of 1 and 0, but I do not understand a theta of 0: what is the model really saying? Moreover, thetas of 1 and 0 lead to predicted genotype frequencies of exactly 0. For example, for model
fit3_C
, above we have:theta_A * theta_B * (1 - theta_C) = 0
.theta_A * theta_B * theta_C * theta_E = 0
astheta_E = 0
.In addition, these patterns can affect genes that are not terminal leaves, but internal nodes such as E in the next example:
And we get some genotypes with predicted probabilities exactly 0. For example: BF (because
theta_C = 1
), BCF (becausetheta_A = 1
), BCFE, BCDEF becausetheta_C = 0
, etc.Granted, these are synthetic data and these could be corner cases, but I wonder if I am missing something. d222.txt
d333.txt