ExomeDepth R package for the detection of copy number variants in exomes and gene panels using high throughput DNA sequencing data.
R crashes with very specific input #35

kendrickkoo commented 3 years ago

When looping through a large number of samples, I found that ExomeDepth would induce a crash on very specific files. This manifests as a termination of R with no error message. I have dug down into the function code and have attached data and code to reproduce the error.

The error only happens with the specific order of inputs (i.e. column 2 in "fail" as the test and column 1 as the reference). When flipping the two, the code executes without any issue. I have attempted to trace the error deeper but when running the ExomeDepth constructor piecemeal, a crash cannot be induced.

It is perplexing that specific data would cause such a crash. There is nothing unique about the sample associated with the test column even on manual inspection of the data.


fail = read.delim("failure_file.txt", sep = "\t", stringsAsFactors = F)

formula = "cbind(test, reference) ~ 1"
phi.bins = 1
data = NULL

my.mod <- new("ExomeDepth", test = fail[,2], 
            reference = fail[,1], formula = formula, data = data, 
            phi.bins = phi.bins, verbose = TRUE)


kendrickkoo commented 3 years ago

More error details when running the same code on a Linux system (error is not platform dependent):

Now fitting the beta-binomial model on a data frame with 54836 rows : this step can take a few minutes.
Now computing the likelihood for the different copy number states
ERROR beta.c 
 *** caught segfault ***
address 0x20, cause 'memory not mapped'

 1: .Call("get_loglike_matrix", phi = .Object@phi, expected = .Object@expected,     total = as.integer(.Object@reference + .Object@test), observed = as.integer(.Object@test),     mixture = prop.tumor)
 2: .local(.Object, ...)
 3: initialize(value, ...)
 4: initialize(value, ...)
 5: new("ExomeDepth", test = fail[, 2], reference = fail[, 1], formula = formula,     data = data, phi.bins = phi.bins, verbose = TRUE)
halessi commented 3 years ago

@kendrickkoo Hello, I just got this same error after attempting to run ~140 samples at once. Were you successful in fixing?

kendrickkoo commented 3 years ago

@halessi I was unable to fix this error, but I believe I have traced the problem. R was crashing on the same couple of samples, and it turns out that they had very low read depths. Removing them from the experiment solved the problem. Additionally, I had run a targeted sequencing experiment and I also noticed that when I computed read counts over the whole exome (resulting in a lot of zeros), more samples started to fail. Hopefully those suggestions help with your situation. Should be a sticky for someone to implement graceful failure rather than completely crashing.

irocote commented 2 years ago

Hi, has anyone found a solution to this, or at least an explanation? I've just encountered the same exact error, on a Linux platform:

Now fitting the beta-binomial model on a data frame with 185130 rows : this step can take a few minutes.
Now computing the likelihood for the different copy number states
ERROR beta.c 
 *** caught segfault ***
address 0x20, cause 'memory not mapped'

 1: .local(.Object, ...)
 2: initialize(value, ...)
 3: initialize(value, ...)
 4: new("ExomeDepth", test = my.test, reference = my.reference.selected, formula = "cbind(test,reference)~ExomeCount.dafr$GC")
An irrecoverable exception occurred. R is aborting now ...

The sample giving this error has a mean target coverage of 657X, so I don't think the problem here is due to low read depths, unless it is caused by low read depths in specific regions...


irocote commented 2 years ago

Just in case someone is having the same issue: we found that using the bedfile of the design for extracting the read counts instead of the exon positions for the hg19 build of the human genome provided by the package solved this problem.