Open thoree opened 4 years ago
Thanks, that is an interesting idea, which shouldn't be too difficult to implement. I'm not sure exactly how you envision the output to look like, though. You say a "list of profiles", but could you give an explicit example?
Below is a quick attempt using your example, resulting in a sorted data frame. Is this in the right direction?
library(pedprobr)
#> Loading required package: pedtools
x = nuclearPed(2, father = "FA")
m1 = marker(x, "3" = 1, "4" = 1:2, alleles = 1:2)
m2 = marker(x, "3" = 1, "4" = 2, alleles = 1:2)
p1 = oneMarkerDistribution(x, "FA", partialmarker = m1, verbose = F)
p2 = oneMarkerDistribution(x, "FA", partialmarker = m2, verbose = F)
# Posterior probabilities
pp = p1 %o% p2
# Indices of positive entries
idx = which(pp > 0, arr.ind = T)
# Result as data.frame
res = data.frame(
M1 = rownames(pp)[idx[, 'row']],
M2 = colnames(pp)[idx[, 'col']],
Prob = pp[pp > 0],
stringsAsFactors = F)
# Sort
res[order(res$Prob, decreasing = T), , drop = F]
#> M1 M2 Prob
#> 2 1/2 1/2 0.6666667
#> 1 1/1 1/2 0.3333333
Created on 2019-11-11 by the reprex package (v0.3.0)
Very nice! I tried this on an MP example, F5. Then one could restrict attention to markers for which EP>0. However, in this example, and I fear in most MP examples, the list will be too long and no candidate profiles will stand out as as clearly more likely. In some cases, like the below one from your Argentina talk I would think the profile of MP could be shortlisted
,
Ok, here is a sketch of a function, which (I think) does more or less what you want. I don't know how it will react when the dimensions blow up, but I included a parameter maxPedMarker
which limits the number of genotypes considered for each marker.
library(pedprobr)
#> Loading required package: pedtools
rankProfiles = function(x, id, markers = NULL, maxPerMarker = Inf, verbose = TRUE) {
if(!hasMarkers(x))
stop("No markers attached to pedigree")
if(is.null(markers))
markers = seq_len(nMarkers(x))
# Extract wanted markers
x = selectMarkers(x, markers)
nmark = nMarkers(x)
# Marker names
mnames = name(x, 1:nmark)
mnames[is.na(mnames)] = as.character((1:nmark)[is.na(mnames)])
# Posterior distributions for each marker
omdList = lapply(1:nmark, function(i) {
omd = oneMarkerDistribution(x, ids = id, partialmarker = i, verbose = FALSE)
a = omd[omd > 0, drop = F]
if(maxPerMarker < length(a))
a = sort(a, decreasing = T)[1:maxPerMarker]
# Trick to ensure proper naming in `expand.grid` below
names(dimnames(a)) = mnames[i]
if(verbose) {
print(a)
cat("\n")
}
a
})
# Array with total posterior probs
pp = Reduce(`%o%`, omdList)
# Transform dimnames into data frame with possible profiles
profiles = expand.grid(dimnames(pp))
# Add probabilities
prob = as.numeric(pp)
profiles$Posterior = prob
# Sort
profiles = profiles[order(prob, decreasing = T), , drop = F]
row.names(profiles) = NULL
profiles
}
And here's an expanded version of your original example:
x = nuclearPed(2, father = "FA")
m1 = marker(x, "3" = 1, "4" = 1:2, alleles = 1:2, name = "m1")
m2 = marker(x, "3" = 1, "4" = 2, alleles = 1:2, name = "m2")
m3 = marker(x, "3" = 2, "4" = 2, alleles = 1:2, name = "m3")
x = setMarkers(x, list(m1, m2, m3))
rankProfiles(x, "FA")
#> m1
#> 1/1 1/2
#> 0.3333333 0.6666667
#>
#> m2
#> 1/2
#> 1
#>
#> m3
#> 1/2 2/2
#> 0.3333333 0.6666667
#>
#> m1 m2 m3 Posterior
#> 1 1/2 1/2 2/2 0.4444444
#> 2 1/2 1/2 1/2 0.2222222
#> 3 1/1 1/2 2/2 0.2222222
#> 4 1/1 1/2 1/2 0.1111111
Created on 2019-11-11 by the reprex package (v0.3.0)
Thanks, I will try some examples and get back to you!
Created on 2019-11-09 by the reprex package (v0.3.0)