weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
187 stars 72 forks source link

Memory error during step2_SPATests.R #369

Closed Xophmeister closed 2 years ago

Xophmeister commented 2 years ago

We are running the SAIGE 0.44.5 Docker container in a Cromwell pipeline. With VCFs as input to step2_SPATests.R, this is causing a segmentation fault (with BGEN files, it didn't fail). Our researchers didn't experience this problem with SAIGE 0.41; we've installed this container and it seems to be working (it's still running, as of writing).

If it's useful, here is the stderr of one of the failed tasks:

Garbage collection 14 = 8+2+4 (level 2) ... 
81.9 Mbytes of cons cells used (61%)
47.5 Mbytes of vectors used (61%)

 *** caught segfault ***
address 0x30, cause 'memory not mapped'

Traceback:
 1: Score_Test_Sparse(obj.noK, y, X, G0, mu.a, mu2.a, varRatio, IsOutputlogPforSingle = IsOutputlogPforSingle)
 2: doTryCatch(return(expr), name, parentenv, handler)
 3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 4: tryCatchList(expr, classes, parentenv, handlers)
 5: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if (!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))             call <- sys.call(-4L)        dcall <- deparse(call)[1L]        prefix <- paste("Error in", dcall, ": ")        LONG <- 75L        sm <- strsplit(conditionMessage(e), "\n")[[1L]]        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if (is.na(w))             w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],                 type = "b")        if (w > LONG)             prefix <- paste0(prefix, "\n  ")    }    else prefix <- "Error : "    msg <- paste0(prefix, conditionMessage(e), "\n")    .Internal(seterrmessage(msg[1L]))    if (!silent && isTRUE(getOption("show.error.messages"))) {        cat(msg, file = outFile)        .Internal(printDeferredWarnings())    }    invisible(structure(msg, class = "try-error", condition = e))})
 6: try(Score_Test_Sparse(obj.noK, y, X, G0, mu.a, mu2.a, varRatio,     IsOutputlogPforSingle = IsOutputlogPforSingle), silent = TRUE)
 7: scoreTest_SAIGE_binaryTrait_cond_sparseSigma(G0, AC, AF, MAF,     IsSparse, obj.model$obj.noK, obj.model$mu, obj.model$mu2,     y, X, varRatio, Cutoff, rowHeader, sparseSigma = sparseSigma,     isCondition = isCondition, OUT_cond = OUT_cond, G1tilde_P_G2tilde = G1tilde_P_G2tilde,     G2tilde_P_G2tilde_inv = G2tilde_P_G2tilde_inv, IsOutputlogPforSingle = IsOutputlogPforSingle)
 8: SPAGMMATtest(vcfFile = opt$vcfFile, vcfFileIndex = opt$vcfFileIndex,     vcfField = opt$vcfField, bgenFile = opt$bgenFile, bgenFileIndex = opt$bgenFileIndex,     savFile = opt$savFile, savFileIndex = opt$savFileIndex, idstoExcludeFile = opt$idstoExcludeFile,     idstoIncludeFile = opt$idstoIncludeFile, rangestoExcludeFile = opt$rangestoExcludeFile,     rangestoIncludeFile = opt$rangestoIncludeFile, chrom = opt$chrom,     start = opt$start, end = opt$end, IsDropMissingDosages = opt$IsDropMissingDosages,     sampleFile = opt$sampleFile, GMMATmodelFile = opt$GMMATmodelFile,     varianceRatioFile = opt$varianceRatioFile, SAIGEOutputFile = opt$SAIGEOutputFile,     minMAF = opt$minMAF, minMAC = opt$minMAC, numLinesOutput = opt$numLinesOutput,     IsOutputAFinCaseCtrl = opt$IsOutputAFinCaseCtrl, IsOutputNinCaseCtrl = opt$IsOutputNinCaseCtrl,     condition = opt$condition, maxMAFforGroupTest = opt$maxMAFforGroupTest,     groupFile = opt$groupFile, sparseSigmaFile = opt$sparseSigmaFile,     minInfo = opt$minInfo, cateVarRatioMinMACVecExclude = cateVarRatioMinMACVecExclude,     cateVarRatioMaxMACVecInclude = cateVarRatioMaxMACVecInclude,     IsSingleVarinGroupTest = opt$IsSingleVarinGroupTest, dosageZerodCutoff = opt$dosageZerodCutoff,     IsOutputPvalueNAinGroupTestforBinary = opt$IsOutputPvalueNAinGroupTestforBinary,     IsAccountforCasecontrolImbalanceinGroupTest = opt$IsAccountforCasecontrolImbalanceinGroupTest,     method = opt$method, kernel = opt$kernel, weights.beta.rare = weights.beta.rare,     weights.beta.common = weights.beta.common, weightMAFcutoff = opt$weightMAFcutoff,     r.corr = opt$r.corr, weightsIncludeinGroupFile = opt$weightsIncludeinGroupFile,     weights_for_G2_cond = weights_for_G2_cond, IsOutputBETASEinBurdenTest = opt$IsOutputBETASEinBurdenTest,     SPAcutoff = opt$SPAcutoff, IsOutputHetHomCountsinCaseCtrl = opt$IsOutputHetHomCountsinCaseCtrl,     LOCO = opt$LOCO, sampleFile_male = opt$sampleFile_male, is_rewrite_XnonPAR_forMales = opt$is_rewrite_XnonPAR_forMales,     X_PARregion = opt$X_PARregion, method_to_CollapseUltraRare = opt$method_to_CollapseUltraRare,     MACCutoff_to_CollapseUltraRare = opt$MACCutoff_to_CollapseUltraRare,     DosageCutoff_for_UltraRarePresence = opt$DosageCutoff_for_UltraRarePresence,     IsOutputMAFinCaseCtrlinGroupTest = opt$IsOutputMAFinCaseCtrlinGroupTest)
An irrecoverable exception occurred. R is aborting now ...
/cromwell_root/script: line 36:    17 Segmentation fault      (core dumped) step2_SPAtests.R --vcfFile=/cromwell_root/qmul-sandbox-production-library-red/genesandhealth/GSAv3EAMD/Jul2021_44k_TOPMED-r2_Imputation_b38/topmed-r2_merged_version02/chr22.dose.merged_INFO0.3_MAF0.00001.vcf.gz --vcfFileIndex=/cromwell_root/qmul-sandbox-production-library-red/genesandhealth/GSAv3EAMD/Jul2021_44k_TOPMED-r2_Imputation_b38/topmed-r2_merged_version02/chr22.dose.merged_INFO0.3_MAF0.00001.vcf.gz.csi --vcfField=DS --chrom=chr22 --minMAC=5 --GMMATmodelFile=/cromwell_root/fg-qmul-production-sandbox-1_red/cromwell/workflows/SAIGE/d66a766f-e7ab-4272-a60a-55e68918160c/call-FitNullGLMM/shard-2/step1-Depression2.rda --varianceRatioFile=/cromwell_root/fg-qmul-production-sandbox-1_red/cromwell/workflows/SAIGE/d66a766f-e7ab-4272-a60a-55e68918160c/call-FitNullGLMM/shard-2/step1-Depression2.varianceRatio.txt --numLinesOutput=1000 --IsOutputNinCaseCtrl=TRUE --IsOutputHetHomCountsinCaseCtrl=TRUE --IsOutputAFinCaseCtrl=TRUE --LOCO=FALSE --SAIGEOutputFile=step2-Depression2-chr22.gwas
Xophmeister commented 2 years ago

(I note that this is very similar to issue #346)

weizhouUMICH commented 2 years ago

Sorry for my late reply! We have just released a new version 1.0.0. It has substantial computation efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ The program will be maintained by multiple SAIGE developers there. The docker image has been updated. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei