exclusionPower could hang

thoree commented 5 years ago

This is probably obvious, but the documentation could note that exclusionPower will hang if n is increased as indicated below:

library(forrel, quietly = TRUE)
n = 3 # hangs for large n, say n = 40
claim = nuclearPed(3, children = c("B1", "B2", "POI"))
x1 = nuclearPed(3, children = c("B1", "B2", "MP"))
x2 = singleton("POI")
true = list(x1,x2)
als = 1:n
p = rep(1/n, n)
exclusionPower(claim, true, ids = c("B1", "B2", "POI"), alleles = als, 
               afreq = p, plot = FALSE)
#> [1] 0.05144033

^{Created on 2018-10-22 by the reprex package (v0.2.1)}

magnusdv commented 5 years ago

Thanks, I agree that the docs could mention this.

Passing thought: Is it possible to do allele lumping in some exclusionPower applications?

thoree commented 5 years ago

I agree: lumping should be possible. It is probably not relevant to do exclusion calculations when mutations are modelled. If all entries of the mutation matrix are positive, as they will be for most models, the exclusion probability vanishes.

thoree commented 5 years ago

Here's a slightly simpler pedigree for which exclusionPower can be made to hang for large n. Whether this can be speeded up or if it is possible to detect whether exclusion is impossible, as below, I don't know.

library(forrel, quietly = TRUE)
claim = nuclearPed(2, children = c("B1","B2"))
n = 3
als = 1:n
p = rep(1/n, n)
m = marker(claim, alleles = als, afreq = p)
claim = setMarkers(claim, m)
true1 = nuclearPed(1, children = c("B1"))
m1 = marker(true1, alleles = als, afreq = p)
true1 = setMarkers(true1, m1)
true2 = singleton("B2")
m2 = marker(true2, alleles = als, afreq = p)
true2 = setMarkers(true2, m2)
true = list(true1, true2)
exclusionPower(claim, true,ids = c("B1", "B2"), markerindex = 1)

#> [1] 0

^{Created on 2018-11-03 by the reprex package (v0.2.1)}

thoree commented 5 years ago

Exclusion becomes possible with a completely inbred founder in the previous example:

library(forrel, quietly = TRUE)
claim = nuclearPed(2, father = "fa", mother = "mo",  children = c("B1","B2"))
founder_inbreeding(claim, c("fa")) = 1
n = 2
als = 1:n
p = rep(1/n, n)
m = marker(claim, alleles = als, afreq = p)
claim = setMarkers(claim, m)
true1 = nuclearPed(1, children = c("B1"))
m1 = marker(true1, alleles = als, afreq = p)
true1 = setMarkers(true1, m1)
true2 = singleton("B2")
m2 = marker(true2, alleles = als, afreq = p)
true2 = setMarkers(true2, m2)
true = list(true1, true2)
exclusionPower(claim, true,ids = c("B1", "B2"), markerindex = 1, plot = FALSE)
#> [1] 0.125

^{Created on 2018-11-03 by the reprex package (v0.2.1)}

magnusdv commented 5 years ago

That's a clever example - nice use of founder inbreeding!

This is also a good example of a situation where its unclear to me how to lump alleles (if n is large) in order to speed up the computation. The exclusion power boils down to the probability of IBS = 0 for an unrelated pair, which is a 4th degree polynomial in the allele frequencies, a polynomial which I'm pretty certain is not invariant under any standard lumping procedure.

So for many alleles (say n=100) this is interesting/frustrating: the PE is trivially computable from the theoretical formula, while I don't see any way I could trick forrel::exclusionPower() into reaching the same answer without hanging.

For the special case (including your example) of a pairwise PE, with no previously typed pedigree members, one could perhaps compute the PE directly from IBD/IBS probabilities. But I don't know how important that would be.

For the record, here's a slightly simpler way of coding your example (note in particular the easier way of obtaining the true pedigree):

library(forrel, quietly = T)

# Number of alleles
n = 2

# Claim: brothers
claim = nuclearPed(father = "fa", mother = "mo", children = c("B1","B2"))
founder_inbreeding(claim, "fa") = 1
claim = setMarkers(claim, marker(claim, alleles = 1:n))

# True: unrelated
true = list(removeIndividuals(claim, "B2"), subset(claim, "B2"))
#> Removing B2 (no descendants)
#> Warning: Non-zero founder inbreeding lost. (Individuals: fa)

# Exclusion power
exclusionPower(claim, true, ids = c("B1", "B2"), markerindex = 1, plot = F)
#> [1] 0.125

thoree commented 5 years ago

I agree, lumping or other speed-ups seem beyoynd reach. exclusionPower solves cases of practical interest. For extreme cases one could always resort to simulation which appears feasible with markerSim or profileSim combined with transferMarkersand LR.

magnusdv commented 5 years ago

I'm closing this for now, but feel free to reopen if new ideas emerge.

magnusdv / forrel

exclusionPower could hang #8