rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
73 stars 19 forks source link

Segmentation fault MRE #82

Closed saulpierotti closed 1 year ago

saulpierotti commented 1 year ago

I have been experiencing segmentation fault errors in STITCH on an intermittent basis. This always happens when loading the bamfiles, not during EM. I managed to narrow the issue down to a single bam file and a single chromosome (though also other bam files cause the same issue). The issue happens only when nCores > 1. I was able to reproduce the issue in several different machines. I am using the STITCH installation from conda.

Here the code that I run:

library("STITCH")

STITCH(
  tempdir = ".",
  chr = "12",
  posfile = "poslist.tsv",
  bamlist = "samples.csv",
  nGen = 1,
  K = 1,
  outputdir = ".",
  nCores = 2,
  generateInputOnly = TRUE
)

This is my output of sessionInfo():

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS/LAPACK: /home/saul/micromamba/envs/stitch/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.2.2

This is the segfault error:

Loading required package: parallel
Loading required package: rrbgen
[2023-03-18 16:12:10] Running STITCH(chr = 12, nGen = 1, posfile = ./poslist.tsv, K = 1, S = 1, outputdir = ., nStarts = , tempdir = ., bamlist = samples.csv, cramlist = , sampleNames_file = , reference = , genfile = , method = diploid, output_format = bgvcf, B_bit_prob = 16, outputInputInVCFFormat = FALSE, downsampleToCov = 50, downsampleFraction = 1, readAware = TRUE, chrStart = NA, chrEnd = NA, regionStart = NA, regionEnd = NA, buffer = NA, maxDifferenceBetweenReads = 1000, maxEmissionMatrixDifference = 1e+10, alphaMatThreshold = 1e-04, emissionThreshold = 1e-04, iSizeUpperLimit = 600, bqFilter = 17, niterations = 40, shuffleHaplotypeIterations = c(4, 8, 12, 16), splitReadIterations = 25, nCores = 2, expRate = 0.5, maxRate = 100, minRate = 0.1, Jmax = 1000, regenerateInput = TRUE, originalRegionName = NA, keepInterimFiles = FALSE, keepTempDir = FALSE, outputHaplotypeProbabilities = FALSE, switchModelIteration = NA, generateInputOnly = TRUE, restartIterations = NA, refillIterations = c(6, 10, 14, 18), downsampleSamples = 1, downsampleSamplesKeepList = NA, subsetSNPsfile = NA, useSoftClippedBases = FALSE, outputBlockSize = 1000, outputSNPBlockSize = 10000, inputBundleBlockSize = NA, genetic_map_file = , reference_haplotype_file = , reference_legend_file = , reference_sample_file = , reference_populations = NA, reference_phred = 20, reference_iterations = 40, reference_shuffleHaplotypeIterations = c(4, 8, 12, 16), output_filename = NULL, initial_min_hapProb = 0.2, initial_max_hapProb = 0.8, regenerateInputWithDefaultValues = FALSE, plotHapSumDuringIterations = FALSE, plot_shuffle_haplotype_attempts = FALSE, plotAfterImputation = TRUE, save_sampleReadsInfo = FALSE, gridWindowSize = NA, shuffle_bin_nSNPs = NULL, shuffle_bin_radius = 5000, keepSampleReadsInRAM = FALSE, useTempdirWhileWriting = FALSE, output_haplotype_dosages = FALSE, use_bx_tag = TRUE, bxTagUpperLimit = 50000)
[2023-03-18 16:12:10] Program start
[2023-03-18 16:12:10] Get and validate pos and gen
[2023-03-18 16:12:11] Done get and validate pos and gen
[2023-03-18 16:12:11] Get BAM sample names
[2023-03-18 16:12:11] Done getting BAM sample names
[2023-03-18 16:12:11] Generate inputs

 *** caught segfault ***
address 0x55a06e7e6004, cause 'memory not mapped'

Traceback:
 1: cpp_read_reassign(ord = ord, qnameInteger_ord = qnameInteger_ord,     bxtagInteger_ord = bxtagInteger_ord, bxtag_bad_ord = bxtag_bad_ord,     qname = qname, bxtag = bxtag, strand = strand, sampleReadsRaw = sampleReadsRaw,     readStart_ord = readStart_ord, readEnd_ord = readEnd_ord,     readStart = readStart, readEnd = readEnd, iSizeUpperLimit = iSizeUpperLimit,     bxTagUpperLimit = bxTagUpperLimit, use_bx_tag = use_bx_tag,     save_sampleReadsInfo = save_sampleReadsInfo, maxnSNPInRead = maxnSNPInRead)
 2: merge_reads_from_sampleReadsRaw(sampleReadsRaw = sampleReadsRaw,     qname = qname, bxtag = bxtag, strand = strand, readStart = readStart,     readEnd = readEnd, iSizeUpperLimit = iSizeUpperLimit, use_bx_tag = use_bx_tag,     bxTagUpperLimit = bxTagUpperLimit, save_sampleReadsInfo = save_sampleReadsInfo,     qname_all = qname_all, readStart_all = readStart_all, readEnd_all = readEnd_all)
 3: loadBamAndConvert(iBam = iBam, L = L, pos = pos, nSNPs = nSNPs,     bam_files = bam_files, cram_files = cram_files, reference = reference,     iSizeUpperLimit = iSizeUpperLimit, bqFilter = bqFilter, chr = chr,     N = N, downsampleToCov = downsampleToCov, sampleNames = sampleNames,     inputdir = inputdir, useSoftClippedBases = useSoftClippedBases,     regionName = regionName, tempdir = tempdir, chrStart = chrStart,     chrEnd = chrEnd, chrLength = chrLength, save_sampleReadsInfo = save_sampleReadsInfo,     use_bx_tag = use_bx_tag, bxTagUpperLimit = bxTagUpperLimit)
 4: FUN(X[[i]], ...)
 5: lapply(X = X, FUN = FUN, ...)
 6: mclapply(1:length(sampleRanges), mc.cores = nCores, FUN = loadBamAndConvert_across_a_range,     sampleRanges = sampleRanges, bundling_info = bundling_info,     L = L, pos = pos, nSNPs = nSNPs, bam_files = bam_files, cram_files = cram_files,     reference = reference, iSizeUpperLimit = iSizeUpperLimit,     bqFilter = bqFilter, chr = chr, N = N, downsampleToCov = downsampleToCov,     sampleNames = sampleNames, inputdir = inputdir, useSoftClippedBases = useSoftClippedBases,     regionName = regionName, tempdir = tempdir, chrStart = chrStart,     chrEnd = chrEnd, chrLength = chrLength, save_sampleReadsInfo = save_sampleReadsInfo,     use_bx_tag = use_bx_tag, bxTagUpperLimit = bxTagUpperLimit)
 7: generate_input(bundling_info = bundling_info, L = L, pos = pos,     nSNPs = nSNPs, bam_files = bam_files, cram_files = cram_files,     reference = reference, iSizeUpperLimit = iSizeUpperLimit,     bqFilter = bqFilter, chr = chr, N = N, downsampleToCov = downsampleToCov,     sampleNames = sampleNames, inputdir = inputdir, useSoftClippedBases = useSoftClippedBases,     regionName = regionName, tempdir = tempdir, chrStart = chrStart,     chrEnd = chrEnd, nCores = nCores, save_sampleReadsInfo = save_sampleReadsInfo,     use_bx_tag = use_bx_tag, bxTagUpperLimit = bxTagUpperLimit)
 8: generate_or_refactor_input(regenerateInput = regenerateInput,     bundling_info = bundling_info, L = L, pos = pos, nSNPs = nSNPs,     bam_files = bam_files, cram_files = cram_files, reference = reference,     iSizeUpperLimit = iSizeUpperLimit, bqFilter = bqFilter, chr = chr,     outputdir = outputdir, N = N, downsampleToCov = downsampleToCov,     sampleNames = sampleNames, inputdir = inputdir, useSoftClippedBases = useSoftClippedBases,     regionName = regionName, tempdir = tempdir, chrStart = chrStart,     chrEnd = chrEnd, generateInputOnly = generateInputOnly, nCores = nCores,     save_sampleReadsInfo = save_sampleReadsInfo, use_bx_tag = use_bx_tag,     bxTagUpperLimit = bxTagUpperLimit)
 9: STITCH(tempdir = ".", chr = "12", posfile = "./poslist.tsv",     bamlist = "samples.csv", nGen = 1, K = 1, outputdir = ".",     nCores = 2, generateInputOnly = TRUE)
An irrecoverable exception occurred. R is aborting now ...
[1]    267350 segmentation fault (core dumped)  ./run.R

I am willing to share the bamfile and posfile if needed.

saulpierotti commented 1 year ago

I wanted also to add that when failing like this in generating the inputs, STITCH does not return an error code but just a warning. I think that this is not desirable because when used in a pipeline (nextflow in my case) it is not possible to catch the error. Then, when running EM from the generated inputs, some of the ./inputs files are missing and STITCH exits with code 1.

saulpierotti commented 1 year ago

I think I tracked down the error. This line tries to access the vector bxtagInteger_ord with an index corresponding to the read number.

https://github.com/rwdavies/STITCH/blob/5a5ba98442a105548587ed8cafbf00aece212c94/STITCH/src/functions.cpp#L469)

However bxtagInteger_ord is of size 0 when use_bx_tag is FALSE, leading to undefined behaviour. bxtagInteger_ord is passed as an argument to to the function cpp_read_reassign, and it is defined in the R code here:

https://github.com/rwdavies/STITCH/blob/5a5ba98442a105548587ed8cafbf00aece212c94/STITCH/R/functions.R#L1933-L1941

In my case I was not manually setting use_bx_tag to FALSE, but it is set to FALSE automatically here in case no bx tags are present in the bam file:

https://github.com/rwdavies/STITCH/blob/5a5ba98442a105548587ed8cafbf00aece212c94/STITCH/R/functions.R#L2180-L2185

I tested putting the incriminating line inside a if (use_bx_tag) statement and the error disappears in my test case.

rwdavies commented 1 year ago

Thanks for the great post! That made for a very simple fix, which I've now pushed.

The fact that R is giving a non-zero exit code with the above behaviour is very weird. When I throw a stop() in R it does give a normal error code. I'm not sure this is something I can easily sort out at this time. This feels more like an Rcpp/mclapply/R problem to me. A simple hack in your script is to check for the desired output and throw an error if it doesn't exist would sort you out here, though I acknowledge it's inelegant.

saulpierotti commented 1 year ago

Thanks for the very quick fix!