magnusdv / forrel

Forensic pedigree analysis and relatedness inference
GNU General Public License v2.0
11 stars 0 forks source link

Error in missingPersonLR or profileSim in forrel v1.5.0? #45

Closed MarsicoFL closed 1 year ago

MarsicoFL commented 1 year ago

Dear Magnus,

I was working with mispitools package, simLRgen function, and detected a bug when calculating LRs distributions considering an unrelated (poi1, lr1 in the function) and a related (poi2, lr2) poi. As you will be able to see, poi2 always gives low LR values, not consistent with what is expected (for example grandfather with 20 markers).

When I install a previous forrel version v1.4.1 everything works. I have alerted about this issue in mispitools documentation.

Below I paste simLRgen function and a tiny example where you can run and detect the inconsistent result.

Bests, Franco

simLRgen

simLRgen = function(reference, missing, numsims, seed) {
  st = base::Sys.time()

  if(pedtools::is.pedList(reference) && base::length(reference) == 1)
    reference = reference[[1]]

  if(!pedtools::is.ped(reference))
    base::stop("Expecting a connected pedigree as H1")

set.seed(seed)

poi1 = pedtools::singleton("poi1")
poi1 = pedtools::transferMarkers(from = reference, to = poi1)
poi1 = forrel::profileSim(poi1, numsims)

lr1 <- as.list(rep(NA, numsims))
for(i in 1:numsims) {
       lr1[[i]] = forrel::missingPersonLR(reference, missing, poi = poi1[[i]])
    }

poi2ped = forrel::profileSim(reference, numsims, ids = missing)

poi2 <- base::as.list(base::rep(NA, numsims))
for(i in 1:numsims) {
  poi2[[i]] = base::subset(poi2ped[[i]], missing)
}

base::rm(poi2ped)

lr2 <- base::as.list(rep(NA, numsims))

for(i in 1:numsims) {
  lr2[[i]] = forrel::missingPersonLR(reference, missing, poi = poi2[[i]])
}

LRsimulated <- base::cbind(base::sapply(lr1, function(x) {x[["LRtotal"]][["H1:H2"]]}),
   base::sapply(lr2, function(x) {x[["LRtotal"]][["H1:H2"]]}))
   base::colnames(LRsimulated) <- c("Unrelated", "Related")
   base::structure(base::as.data.frame(LRsimulated))
}

example

library(forrel)
x = linearPed(2)
plot(x)
x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:5])
x = profileSim(x, N = 1, ids = 2)
datasim = simLRgen(x, missing = 5, 100, 123)
MarsicoFL commented 1 year ago

I forget to tell that this error has ocurred after mispitools updating, having your proposed safeguard for upcoming changes in forrel. So maybe something else still missing.

Bests

magnusdv commented 1 year ago

Hi, The code runs cleanly here. Can you give a minimal reproducible example including the output, and explain what you think is wrong, i.e., what you expected as output? Thanks!

(Tip for making debugging simpler for everyone: use the reprex package)

MarsicoFL commented 1 year ago

Hi, thanks for the rapid response. No problem, reprex is cool. First of all, we should remove forrel, pedtools and mispitools packages in order to be in the same page about versions:

remove.packages("forrel")
remove.packages("mispitools")
remove.packages("pedtools")
.rs.restartR()

It is important to restart R session with .rs.restartR(), otherwise some changes about packages will not be executed.

First example, where the output does not give what is expected

In the first case we install some packages and run a simple example. As you will (should) see the code works fine and you obtain an output, but the values concern me.

install.packages("forrel")
#> Installing package into '/home/franco/R/x86_64-pc-linux-gnu-library/4.1'
#> (as 'lib' is unspecified)
install.packages("pedtools")
#> Installing package into '/home/franco/R/x86_64-pc-linux-gnu-library/4.1'
#> (as 'lib' is unspecified)
install.packages("mispitools")
#> Installing package into '/home/franco/R/x86_64-pc-linux-gnu-library/4.1'
#> (as 'lib' is unspecified)
library(mispitools)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
library(forrel)
#> Loading required package: pedtools
library(pedtools)
x = linearPed(2)
x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:5])
x = profileSim(x, N = 1, ids = 2)
datasim = simLRgen(x, missing = 5, 10, 123)
#> 
#> Forming H1 from reference:
#>   * Renaming `5` to `poi1`
#>   * Transferring genotypes
# Comment Franco: these last lines appears several times.

datasim
#>     Unrelated      Related
#> 1  0.25512474 6.485134e-08
#> 2  0.95971889 1.428986e-07
#> 3  0.21874275 1.003223e-08
#> 4  0.06626652 3.165753e-08
#> 5  1.53063554 7.578325e-07
#> 6  0.33425729 1.656594e-07
#> 7  0.03125000 1.476222e-08
#> 8  0.21251279 7.801781e-08
#> 9  2.30391444 1.000732e-06
#> 10 1.53063554 1.756159e-07

In the previous comment you have the simLRgen function. You are seeing here two lists of LRs values obtained considering POI as unrelated (left) or POI as related (rigth) with the pedigree members. Very low LRs are obtained for those Related. At first I thought that it was an error in asigning names to the columns. But it was not.

Second example, where the output gives what is expected

Now lets try to test the forrel older version, 1.4.1. We must first remove forrel last version installed. Importantly, after it we must restart R sessions (otherwise we will get the same result from the previous example).

remove.packages("forrel")
.rs.restartR()

Now, executing the following code, other result is obtained:

install.packages("devtools")
#> Installing package into '/home/franco/R/x86_64-pc-linux-gnu-library/4.1'
#> (as 'lib' is unspecified)
library(devtools)
#> Loading required package: usethis
install_version("forrel", version = "1.4.1", repos = "http://cran.us.r-project.org") #this will install previous forrel version
#> Downloading package from url: http://cran.us.r-project.org/src/contrib/Archive/forrel/forrel_1.4.1.tar.gz
#> Installing package into '/home/franco/R/x86_64-pc-linux-gnu-library/4.1'
#> (as 'lib' is unspecified)
.rs.restartR()
#> Error in .rs.restartR(): no se pudo encontrar la función ".rs.restartR"

library(mispitools)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
library(forrel)
#> Loading required package: pedtools
library(pedtools)
x = linearPed(2)
x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:5])
x = profileSim(x, N = 1, ids = 2)
datasim = simLRgen(x, missing = 5, 10, 123)
datasim
#>     Unrelated    Related
#> 1  2.29957033  0.6628718
#> 2  2.14840539 25.9583478
#> 3  3.07373977  3.1552958
#> 4  0.32823312  4.5487078
#> 5  0.18908412  0.2146807
#> 6  0.09751327  2.1313705
#> 7  1.08479280  2.8696056
#> 8  0.13527305  1.3256747
#> 9  0.39259766 19.2611877
#> 10 0.27754416  2.3015476

Now LR values are those expected and higher in the Related case (in most cases, consider that we are in a underpowered case). Other thing that took my attention was that the following message didint appear anymore:

#> Forming H1 from reference:
#>   * Renaming `5` to `poi1`
#>   * Transferring genotypes

Maybe it is just a comment.

Let me know please if you can reproduce it, and if I can help in anything. I will be looking for differences in the forrel functions used in simLRgen to try to understand what is happening.

Bests

magnusdv commented 1 year ago

Many thanks, this should be fixed now. The recent updates of forrel didn't actually introduce any new bugs, but rather revealed one that was already there. Here is your example with the latest version:

library(forrel, quietly = T)

x = linearPed(2) |> 
  profileSim(ids = 2, markers = NorwegianFrequencies[1:5], seed = 321)

simLRgen(x, missing = "5", numsims = 5, seed = 123)
#>    Unrelated   Related
#> 1 0.13908101 7.8166512
#> 2 5.87133535 6.1733557
#> 3 0.09033533 9.2710399
#> 4 3.22291182 9.1870352
#> 5 0.45986001 0.4919038

Created on 2023-03-24 with reprex v2.0.2

Note that your example was not reproducible, since profileSim() was run without a seed. I also simplifed the code a bit.

Another note: You may want to check out the function forrel::missingPersonIP() whose purpose is basically to compute the "Related" column of your simLRgen, as well as various downstream statistics.

MarsicoFL commented 1 year ago

Great, it works here too! Thank you Magnus!

Bests