Open jjfarrell opened 3 years ago
On this run, we got a segfault memory error.
Working with 52877 samples Identifying relatives for each sample using kinship threshold 0.0220970869120796 Identifying pairs of divergent samples using divergence threshold -0.0220970869120796 Partitioning samples into unrelated and related sets... ...1000 samples added to related.set... ...2000 samples added to related.set... ...3000 samples added to related.set... [1] 3721 [1] 49156 Principal Component Analysis (PCA) on genotypes: Calculating allele counts/frequencies ... [==================================================] 100%, completed, 2s (process 1)
# of samples: 49,156
# of SNVs: 87,468
using 28 threads
# of principal components: 3
CPU capabilities: Double-Precision SSE2 Thu Apr 29 13:05:20 2021 (internal increment: 64) [==================================================] 100%, completed, 42.5m Thu Apr 29 13:47:57 2021 Begin (eigenvalues and eigenvectors)
caught segfault address 0x18d8cb0, cause 'memory not mapped'
Traceback: 1: snpgdsPCA(gds_obj, sample.id =sampleset$unrels, num.thread = coreNum, eigen.cnt = pcNum) An irrecoverable exception occurred. R is aborting now ... /var/spool/sge/scc-gh1/job_scripts/6266830: line 7: 36415 Segmentation fault Rscript --vanilla $1
Did you check the memory usage? Ran out of memory?
We are running with 196GB of memory with 28 cores. What is a recommended amount of memory per core be?
We were able to run the PCs on 45k unrelated and then project it on the remaining 8k samples successfully.
I encountered similar issues as jjfarrell mentioned but haven’t found a solution yet. I was running 76K samples for 33,765 SNPs, using PLINK bed/bim/fam files. I was able to get the PCs for 45K samples before by using R 3.6.3/GENESIS_2.16.1. When the sample size goes up to 76K, I got “segfault” error. After reading this post, I upgraded both R and GENESIS to the latest versions. I did two runs:
R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS/LAPACK: /spack/apps/linux-centos7-x86_64/gcc-8.3.0/openblas-0.3.8-2no6mfziiclwxb7lstxoos335gnhjpes/lib/libopenblasp-r0.3.8.so
locale: [1] C
attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base
other attached packages: [1] GWASTools_1.38.0 Biobase_2.52.0 BiocGenerics_0.38.0 [4] SNPRelate_1.26.0 gdsfmt_1.28.0 GENESIS_2.22.1 [7] dplyr_1.0.6 data.table_1.14.0
################################## Run 1: using R 4.1.0/GENESIS_2.22.1, 40 cores, 1T mem, the PCair gave NAs for principal components except for the 1st sample ################################## ====== start run_PCair_PCrelate_1st_iter_V1_largemem.R ======
Attaching package: 'dplyr' The following objects are masked from 'package:data.table': between, first, last The following objects are masked from 'package:stats': filter, lag The following objects are masked from 'package:base': intersect, setdiff, setequal, union
Loading required package: gdsfmt SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2) Loading required package: Biobase Loading required package: BiocGenerics Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:dplyr':
combine, intersect, setdiff, union
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colnames, dirname, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
[1] "workDir = ./" [1] "PLINKFilePrefix = test_76K_num" [1] "kinMatrixFileName = test_76K_num.kc.matrix.txt" [1] "iteration = 1" [1] "KINGmat: 76000 x 76000" Start file conversion from PLINK BED to SNP GDS ... BED file: 'test_76K_num.bed' SNP-major mode (Sample X SNP), 611.8M FAM file: 'test_76K_num.fam' BIM file: 'test_76K_num.bim' Tue Jun 8 19:28:07 2021 (store sample id, snp id, position, and chromosome) start writing: 76000 samples, 33765 SNPs ...
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 36s
Tue Jun 8 19:28:43 2021 Done.
Optimize the access efficiency ...
Clean up the fragments of GDS file:
open the file 'test_76K_num.gds' (612.1M)
save to 'test_76K_num.gds.tmp'
rename 'test_76K_num.gds.tmp' (612.1M, reduced: 252B)
# of fragments: 18
[1] "start PCair 1" Using kinobj and divobj to partition samples into unrelated and related sets Working with 76000 samples Identifying relatives for each sample using kinship threshold 0.0220970869120796 Identifying pairs of divergent samples using divergence threshold -0.0220970869120796 Partitioning samples into unrelated and related sets... ...1000 samples added to related.set... ...2000 samples added to related.set... ...3000 samples added to related.set... ...4000 samples added to related.set... ...5000 samples added to related.set... ...6000 samples added to related.set... ...7000 samples added to related.set... ...8000 samples added to related.set... ...9000 samples added to related.set... ...10000 samples added to related.set... Unrelated Set: 65135 Samples Related Set: 10865 Samples Performing PCA on the Unrelated Set... Principal Component Analysis (PCA) on genotypes: Excluding 0 SNP on non-autosomes Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
# of SNPs: 33,765
using 40 threads
# of principal components: 32
PCA: the sum of all selected genotypes (0,1,2) = 1275566990 CPU capabilities: Double-Precision SSE2 Tue Jun 8 20:03:23 2021 (internal increment: 64) [..................................................] 0%, ETC: --- [=>................................................] 1%, ETC: 1.7h [=>................................................] 2%, ETC: 1.5h [==>...............................................] 3%, ETC: 1.4h [==>...............................................] 4%, ETC: 1.3h [===>..............................................] 5%, ETC: 1.2h [===>..............................................] 6%, ETC: 1.2h [====>.............................................] 7%, ETC: 1.1h [====>.............................................] 8%, ETC: 1.1h [=====>............................................] 9%, ETC: 1.1h [=====>............................................] 10%, ETC: 1.0h [======>...........................................] 11%, ETC: 1.0h [======>...........................................] 12%, ETC: 59.9m [=======>..........................................] 13%, ETC: 58.7m [=======>..........................................] 14%, ETC: 57.8m [========>.........................................] 15%, ETC: 56.7m [========>.........................................] 16%, ETC: 55.5m [=========>........................................] 17%, ETC: 54.5m [=========>........................................] 18%, ETC: 53.8m [==========>.......................................] 19%, ETC: 52.9m [==========>.......................................] 20%, ETC: 52.2m [===========>......................................] 21%, ETC: 49.7m [===========>......................................] 22%, ETC: 48.2m [============>.....................................] 23%, ETC: 47.1m [============>.....................................] 24%, ETC: 46.4m [=============>....................................] 25%, ETC: 46.1m [=============>....................................] 26%, ETC: 45.5m [==============>...................................] 27%, ETC: 44.7m [==============>...................................] 28%, ETC: 44.2m [===============>..................................] 29%, ETC: 43.3m [===============>..................................] 30%, ETC: 42.8m [================>.................................] 31%, ETC: 42.1m [================>.................................] 32%, ETC: 41.3m [=================>................................] 33%, ETC: 40.9m [=================>................................] 34%, ETC: 40.4m [==================>...............................] 35%, ETC: 39.6m [==================>...............................] 36%, ETC: 39.4m [===================>..............................] 37%, ETC: 38.7m [===================>..............................] 38%, ETC: 37.9m [====================>.............................] 39%, ETC: 37.3m [====================>.............................] 40%, ETC: 36.5m [=====================>............................] 41%, ETC: 36.0m [=====================>............................] 42%, ETC: 35.6m [======================>...........................] 43%, ETC: 35.1m [======================>...........................] 44%, ETC: 34.2m [=======================>..........................] 45%, ETC: 33.4m [=======================>..........................] 46%, ETC: 32.8m [========================>.........................] 47%, ETC: 32.4m [========================>.........................] 48%, ETC: 31.8m [=========================>........................] 49%, ETC: 31.3m [=========================>........................] 50%, ETC: 30.8m [==========================>.......................] 51%, ETC: 30.2m [==========================>.......................] 52%, ETC: 29.6m [===========================>......................] 53%, ETC: 28.7m [===========================>......................] 54%, ETC: 28.2m [============================>.....................] 55%, ETC: 27.6m [============================>.....................] 56%, ETC: 26.9m [=============================>....................] 57%, ETC: 26.4m [=============================>....................] 58%, ETC: 26.0m [==============================>...................] 59%, ETC: 25.2m [==============================>...................] 60%, ETC: 24.6m [===============================>..................] 61%, ETC: 24.0m [===============================>..................] 62%, ETC: 23.3m [================================>.................] 63%, ETC: 22.7m [================================>.................] 64%, ETC: 22.1m [=================================>................] 65%, ETC: 21.5m [=================================>................] 66%, ETC: 20.8m [==================================>...............] 67%, ETC: 20.2m [==================================>...............] 68%, ETC: 19.6m [===================================>..............] 69%, ETC: 18.9m [===================================>..............] 70%, ETC: 18.4m [====================================>.............] 71%, ETC: 17.8m [====================================>.............] 72%, ETC: 17.2m [=====================================>............] 73%, ETC: 16.5m [=====================================>............] 74%, ETC: 15.9m [======================================>...........] 75%, ETC: 15.3m [======================================>...........] 76%, ETC: 14.7m [=======================================>..........] 77%, ETC: 14.0m [=======================================>..........] 78%, ETC: 13.3m [========================================>.........] 79%, ETC: 12.7m [========================================>.........] 80%, ETC: 12.1m [=========================================>........] 81%, ETC: 11.5m [=========================================>........] 82%, ETC: 11.0m [==========================================>.......] 83%, ETC: 10.4m [==========================================>.......] 84%, ETC: 9.7m [===========================================>......] 85%, ETC: 9.2m [===========================================>......] 86%, ETC: 8.6m [============================================>.....] 87%, ETC: 8.0m [============================================>.....] 88%, ETC: 7.3m [=============================================>....] 89%, ETC: 6.7m [=============================================>....] 90%, ETC: 6.1m [==============================================>...] 91%, ETC: 5.4m [==============================================>...] 92%, ETC: 4.8m [===============================================>..] 93%, ETC: 4.3m [===============================================>..] 94%, ETC: 3.7m [================================================>.] 95%, ETC: 3.0m [================================================>.] 96%, ETC: 2.4m [=================================================>] 97%, ETC: 1.8m [=================================================>] 98%, ETC: 1.1m [==================================================] 99%, ETC: 32s [==================================================] 100%, completed, 1.0h Tue Jun 8 21:06:07 2021 Begin (eigenvalues and eigenvectors) Fri Jun 11 01:05:21 2021 Done. Predicting PC Values for the Related Set... SNP Loading:
# of SNPs: 33,765
using 40 threads
using the top 32 eigenvectors
SNP Loading: the sum of all selected genotypes (0,1,2) = 1275566990 Fri Jun 11 01:05:32 2021 (internal increment: 380)
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 10s
Fri Jun 11 01:05:42 2021 Done.
Sample Loading:
# of SNPs: 33,765
using 40 threads
using the top 32 eigenvectors
Sample Loading: the sum of all selected genotypes (0,1,2) = 212141863 Fri Jun 11 01:05:49 2021 (internal increment: 2288)
############################################################ run 2: using R 4.1.0/GENESIS_2.22.1, 12 cores, 1T mem, added "export OMP_NUM_THREADS=1", the PCair gave "segfault" ############################################################
====== start run_PCair_PCrelate_1st_iter_V1_largemem.R ======
Attaching package: 'dplyr' The following objects are masked from 'package:data.table': between, first, last The following objects are masked from 'package:stats': filter, lag The following objects are masked from 'package:base': intersect, setdiff, setequal, union
Loading required package: gdsfmt SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2) Loading required package: Biobase Loading required package: BiocGenerics Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:dplyr':
combine, intersect, setdiff, union
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colnames, dirname, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
[1] "workDir = ./" [1] "PLINKFilePrefix = test_76K_num" [1] "kinMatrixFileName = test_76K_num.kc.matrix.txt" [1] "iteration = 1" [1] "KINGmat: 76000 x 76000" Start file conversion from PLINK BED to SNP GDS ... BED file: 'test_76K_num.bed' SNP-major mode (Sample X SNP), 611.8M FAM file: 'test_76K_num.fam' BIM file: 'test_76K_num.bim' Thu Jun 17 16:50:24 2021 (store sample id, snp id, position, and chromosome) start writing: 76000 samples, 33765 SNPs ...
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 9s
Thu Jun 17 16:50:33 2021 Done.
Optimize the access efficiency ...
Clean up the fragments of GDS file:
open the file 'test_76K_num.gds' (612.1M)
save to 'test_76K_num.gds.tmp'
rename 'test_76K_num.gds.tmp' (612.1M, reduced: 252B)
# of fragments: 18
[1] "start PCair 1" Using kinobj and divobj to partition samples into unrelated and related sets Working with 76000 samples Identifying relatives for each sample using kinship threshold 0.0220970869120796 Identifying pairs of divergent samples using divergence threshold -0.0220970869120796 Partitioning samples into unrelated and related sets... ...1000 samples added to related.set... ...2000 samples added to related.set... ...3000 samples added to related.set... ...4000 samples added to related.set... ...5000 samples added to related.set... ...6000 samples added to related.set... ...7000 samples added to related.set... ...8000 samples added to related.set... ...9000 samples added to related.set... ...10000 samples added to related.set... Unrelated Set: 65135 Samples Related Set: 10865 Samples Performing PCA on the Unrelated Set... Principal Component Analysis (PCA) on genotypes: Excluding 0 SNP on non-autosomes Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
# of SNPs: 33,765
using 12 threads
# of principal components: 32
PCA: the sum of all selected genotypes (0,1,2) = 1275566990 CPU capabilities: Double-Precision SSE2 Thu Jun 17 17:24:23 2021 (internal increment: 64) [..................................................] 0%, ETC: --- [=>................................................] 1%, ETC: 3.5h [=>................................................] 2%, ETC: 3.4h [==>...............................................] 3%, ETC: 3.3h [==>...............................................] 4%, ETC: 3.3h [===>..............................................] 5%, ETC: 3.2h [===>..............................................] 6%, ETC: 3.2h [====>.............................................] 7%, ETC: 3.1h [====>.............................................] 8%, ETC: 3.1h [=====>............................................] 9%, ETC: 3.1h [=====>............................................] 10%, ETC: 3.0h [======>...........................................] 11%, ETC: 3.0h [======>...........................................] 12%, ETC: 2.9h [=======>..........................................] 13%, ETC: 2.9h [=======>..........................................] 14%, ETC: 2.9h [========>.........................................] 15%, ETC: 2.8h [========>.........................................] 16%, ETC: 2.8h [=========>........................................] 17%, ETC: 2.8h [=========>........................................] 18%, ETC: 2.7h [==========>.......................................] 19%, ETC: 2.7h [==========>.......................................] 20%, ETC: 2.6h [===========>......................................] 21%, ETC: 2.6h [===========>......................................] 22%, ETC: 2.6h [============>.....................................] 23%, ETC: 2.5h [============>.....................................] 24%, ETC: 2.5h [=============>....................................] 25%, ETC: 2.5h [=============>....................................] 26%, ETC: 2.4h [==============>...................................] 27%, ETC: 2.4h [==============>...................................] 28%, ETC: 2.4h [===============>..................................] 29%, ETC: 2.4h [===============>..................................] 30%, ETC: 2.3h [================>.................................] 31%, ETC: 2.3h [================>.................................] 32%, ETC: 2.3h [=================>................................] 33%, ETC: 2.2h [=================>................................] 34%, ETC: 2.2h [==================>...............................] 35%, ETC: 2.2h [==================>...............................] 36%, ETC: 2.2h [===================>..............................] 37%, ETC: 2.1h [===================>..............................] 38%, ETC: 2.1h [====================>.............................] 39%, ETC: 2.1h [====================>.............................] 40%, ETC: 2.0h [=====================>............................] 41%, ETC: 2.0h [=====================>............................] 42%, ETC: 2.0h [======================>...........................] 43%, ETC: 1.9h [======================>...........................] 44%, ETC: 1.9h [=======================>..........................] 45%, ETC: 1.9h [=======================>..........................] 46%, ETC: 1.8h [========================>.........................] 47%, ETC: 1.8h [========================>.........................] 48%, ETC: 1.8h [=========================>........................] 49%, ETC: 1.7h [=========================>........................] 50%, ETC: 1.7h [==========================>.......................] 51%, ETC: 1.7h [==========================>.......................] 52%, ETC: 1.6h [===========================>......................] 53%, ETC: 1.6h [===========================>......................] 54%, ETC: 1.6h [============================>.....................] 55%, ETC: 1.5h [============================>.....................] 56%, ETC: 1.5h [=============================>....................] 57%, ETC: 1.5h [=============================>....................] 58%, ETC: 1.4h [==============================>...................] 59%, ETC: 1.4h [==============================>...................] 60%, ETC: 1.4h [===============================>..................] 61%, ETC: 1.4h [===============================>..................] 62%, ETC: 1.3h [================================>.................] 63%, ETC: 1.3h [================================>.................] 64%, ETC: 1.3h [=================================>................] 65%, ETC: 1.2h [=================================>................] 66%, ETC: 1.2h [==================================>...............] 67%, ETC: 1.1h [==================================>...............] 68%, ETC: 1.1h [===================================>..............] 69%, ETC: 1.1h [===================================>..............] 70%, ETC: 1.1h [====================================>.............] 71%, ETC: 1.0h [====================================>.............] 72%, ETC: 59.5m [=====================================>............] 73%, ETC: 57.0m [=====================================>............] 74%, ETC: 54.9m [======================================>...........] 75%, ETC: 53.1m [======================================>...........] 76%, ETC: 51.4m [=======================================>..........] 77%, ETC: 49.1m [=======================================>..........] 78%, ETC: 47.0m [========================================>.........] 79%, ETC: 45.0m [========================================>.........] 80%, ETC: 42.7m [=========================================>........] 81%, ETC: 40.7m [=========================================>........] 82%, ETC: 38.7m [==========================================>.......] 83%, ETC: 36.8m [==========================================>.......] 84%, ETC: 34.4m [===========================================>......] 85%, ETC: 32.2m [===========================================>......] 86%, ETC: 30.3m [============================================>.....] 87%, ETC: 28.4m [============================================>.....] 88%, ETC: 25.9m [=============================================>....] 89%, ETC: 23.8m [=============================================>....] 90%, ETC: 21.7m [==============================================>...] 91%, ETC: 19.2m [==============================================>...] 92%, ETC: 17.1m [===============================================>..] 93%, ETC: 15.0m [===============================================>..] 94%, ETC: 13.0m [================================================>.] 95%, ETC: 10.5m [================================================>.] 96%, ETC: 8.4m [=================================================>] 97%, ETC: 6.4m [=================================================>] 98%, ETC: 3.9m [==================================================] 99%, ETC: 1.9m [==================================================] 100%, completed, 3.5h Thu Jun 17 20:53:09 2021 Begin (eigenvalues and eigenvectors)
caught segfault address 0x229eef0, cause 'memory not mapped'
Any insights on what has caused this or how to get around this? The memory utilized for both runs is around 220 G.
Thank you!
I guess you should use the fastPCA option in snpgdsPCA()
(i.e., algorithm="randomized"
).
Please ask Stephanie for this option, I remember they successfully used fastPCA in PCRelate on very large datasets.
The fastPCA option is already available in GENESIS, as any additional arguments to the pcair
function are passed directly to snpgdsPCA
.
Xiuwen and Stephanie, thank you so much for your suggestion! I have added algorithm="randomized" to the PCair function and my 76K sample dataset was able to run through pcair without problem.
Hello Xiuwen and Stephanie, when I took the PCs estimated from PCair and tried to run my 76K sample set through pcrelate, it failed with the following error message:
Error in rbindlist(l, use.names, fill, idcol) :
Total rows in the list is 2165981000 which is larger than the maximum number of rows, currently 2147483647
Calls: pcrelate ... pcrelate -> .local -> .pcrelate -> rbind -> rbind -> rbindlist
In addition: Warning message:
In .pcrelate(gdsobj, pcs = pcs, scale = scale, ibd.probs = ibd.probs, :
small.samp.correct can only be used when all samples are analyzed in one block and `scale != none`
Execution halted
run_PCair_PCrelate_1st_iter_V1_largemem.R failed
It appears to hit the maximum row size (2^31-1) for a R data.table. Does this mean we cannot run pcrelate for more than ~65K samples? Any advice or insights? Thank you!
@GraceSheng the pcrelate issue belongs on the GENESIS page. (Unlike pcair, pcrelate does not use SNPRelate functions.)
Thank you Stephanie! I opened a new issue on the GENESIS page https://github.com/UW-GAC/GENESIS/issues/70
We are getting an error when our sample size goes above 50,000. The PC-AIR step results in just NAs for all the PCS. At 45k samples, it runs fine. Originally reported this issue to GENESIS (https://github.com/UW-GAC/GENESIS/issues/64) Any suggestions on this issue?
Here is the code...
Here is the session info:
Here is the log, this time the script run through but the PCs results are NaNs: /restricted/projectnb/adgc/zhucc/ADGC_data/pheno/PCs/script_53k_twoStepsPCS/adgc.pc-air.pcs.txt