zhengxwen / SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)
http://www.bioconductor.org/packages/SNPRelate
98 stars 25 forks source link

caught segfault 'memory not mapped' with snpgdsPCA #49

Open evigorito opened 5 years ago

evigorito commented 5 years ago

I am trying to run snpgdsPCA for 1000G phase 3 project but I get a caught segfault 'memory not mapped' error. I am using snakemake as follows: rule RP_PCA: """ Compute PCA for reference panel, the matrix of loadings to apply to PEAC samples and applies the matrix of loadings to samples""" input:
RP=config['output_dir'] + "/DNA/RP_PCA.gds" output: PC=config['output_dir'] + "/DNA/RP_pcs.rds", Loads=config['output_dir'] + "/DNA/RP_loads.rds" params: ld=0.2, maf=0.05, method="corr" threads: 12 script: "Rscripts/PCA.R"

with PCA.R being:

library(gdsfmt) library(SNPRelate)

' Computes PCA from reference panel after pruning variants

LD pruning

RPgenofile <- snpgdsOpen(snakemake@input[["RP"]]) snpset <- snpgdsLDpruning(RPgenofile, ld.threshold = as.numeric(snakemake@params[["ld"]]), maf = as.numeric(snakemake@params[["maf"]]), method = snakemake@params[["method"]], verbose= TRUE)

Pcs

pc <- snpgdsPCA(RPgenofile, snp.id=unlist(snpset), num.thread = snakemake@threads) saveRDS(pc, file = snakemake@output[["PC"]])

Matrix of loadings (rotation matrix)

snpL <- snpgdsPCASNPLoading(pc, RPgenofile, num.thread=snakemake@threads, verbose=TRUE) saveRDS(snpL, file = snakemake@output[["Loads"]])

snpgdsClose(RPgenofile)

The output I get in the log file: SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2) SNP pruning based on LD: Excluding 0 SNP on non-autosomes Excluding 24,520,510 SNPs (monomorphic: TRUE, MAF: 0.05, missing rate: NaN) Working space: 2,504 samples, 5,416,399 SNPs using 1 (CPU) core sliding window: 500,000 basepairs, Inf SNPs |LD| threshold: 0.2 method: correlation Chromosome 10: 0.37%, 5,642/1,511,473 Chromosome 11: 0.34%, 5,159/1,512,594 Chromosome 12: 0.37%, 5,393/1,449,574 Chromosome 13: 0.38%, 4,045/1,078,256 Chromosome 14: 0.37%, 3,617/984,477 Chromosome 15: 0.39%, 3,418/874,176 Chromosome 16: 0.42%, 3,831/908,914 Chromosome 17: 0.43%, 3,398/785,496 Chromosome 18: 0.41%, 3,401/821,430 Chromosome 19: 0.45%, 2,776/614,134 Chromosome 1: 0.36%, 8,575/2,380,227 Chromosome 20: 0.45%, 2,923/649,909 Chromosome 21: 0.43%, 1,666/383,559 Chromosome 22: 0.48%, 1,738/365,490 Chromosome 2: 0.32%, 8,501/2,649,142 Chromosome 3: 0.34%, 7,416/2,190,874 Chromosome 4: 0.33%, 7,049/2,156,548 Chromosome 5: 0.34%, 6,586/1,964,228 Chromosome 6: 0.34%, 6,515/1,929,815 Chromosome 7: 0.35%, 6,064/1,722,841 Chromosome 8: 0.32%, 5,541/1,718,600 Chromosome 9: 0.39%, 4,952/1,285,152 108,206 markers are selected in total. Principal Component Analysis (PCA) on genotypes: Excluding 29,828,703 SNPs (non-autosomes or non-selection) Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN) Working space: 2,504 samples, 108,206 SNPs using 12 (CPU) cores PCA: the sum of all selected genotypes (0,1,2) = 409184978 M[..................................................] 0%, ETC: --- ^M[==================================================] 100%, completed in 21s

caught segfault address 0x7efe3ce179d0, cause 'memory not mapped'

Traceback: 1: .Call(gnrPCA, eigen.cnt, algorithm, ws$num.thread, param, verbose) 2: snpgdsPCA(RPgenofile, snp.id = unlist(snpset), num.thread = snakemake@threads) An irrecoverable exception occurred. R is aborting now ... /usr/bin/bash: line 1: 66959 Segmentation fault Rscript /home/ev250/emedlab/snake-pipe/Rscripts/.snakemake.5abh71y_.PCA.R

Any help would be much appreciated!!

zhengxwen commented 5 years ago

It might be "no sufficient memory".