We are getting an error when our sample size goes about 50,000. The PC-AIR step results in just NAs for the PCS. At 45k samples, it runs fine. Any suggestions on this issue?
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 sample.id
-0.0120138984629118 -0.00155873339950702 -0.00110230516591038 -9.77996252854696e-05 0.000395412453209808 -0.00152135811291193 -1.45423707942342e-05 -0.00201312188579735 -0.00394751800943769 -0.00282981157052519 08AD09144_NACC022453
NA NA NA NA NA NA NA NA NA NA 08AD8245_NACC657970
NA NA NA NA NA NA NA NA NA NA 08AD9080_NACC594499
NA NA NA NA NA NA NA NA NA NA 08AD9797
NA NA NA NA NA NA NA NA NA NA 08AD10457
NA NA NA NA NA NA NA NA NA NA 08AD10987
NA NA NA NA NA NA NA NA NA NA 08AD11073
NA NA NA NA NA NA NA NA NA NA 08AD11218
NA NA NA NA NA NA NA NA NA NA 08AD11219
Here is the log, this time the script run through but the PCs results are NaNs:
/restricted/projectnb/adgc/zhucc/ADGC_data/pheno/PCs/script_53k_twoStepsPCS/adgc.pc-air.pcs.txt
Working with 52877 samples
Identifying relatives for each sample using kinship threshold 0.0220970869120796
Identifying pairs of divergent samples using divergence threshold -0.0220970869120796
Partitioning samples into unrelated and related sets...
...1000 samples added to related.set...
...2000 samples added to related.set...
...3000 samples added to related.set...
[1] 3721
[1] 49156
Principal Component Analysis (PCA) on genotypes:
Excluding 0 SNP on non-autosomes
Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
# of samples: 49,156
# of SNPs: 87,468
using 28 threads
# of principal components: 32
PCA: the sum of all selected genotypes (0,1,2) = 1857235516
CPU capabilities: Double-Precision SSE2
Fri Apr 16 22:45:20 2021 (internal increment: 64)
[==================================================] 100%, completed, 43.8m
Fri Apr 16 23:29:14 2021 Begin (eigenvalues and eigenvectors)
Sat Apr 17 10:09:42 2021 Done.
SNP Loading:
# of samples: 49,156
# of SNPs: 87,468
using 28 threads
using the top 32 eigenvectors
SNP Loading: the sum of all selected genotypes (0,1,2) = 1857235516
Sat Apr 17 10:09:50 2021 (internal increment: 444)
[==================================================] 100%, completed, 7s
Sat Apr 17 10:09:57 2021 Done.
Sample Loading:
# of samples: 3,721
# of SNPs: 87,468
using 28 threads
using the top 32 eigenvectors
Sample Loading: the sum of all selected genotypes (0,1,2) = 140973579
Sat Apr 17 10:10:01 2021 (internal increment: 5908)
[==================================================] 100%, completed, 4s
Sat Apr 17 10:10:05 2021 Done.
We are getting an error when our sample size goes about 50,000. The PC-AIR step results in just NAs for the PCS. At 45k samples, it runs fine. Any suggestions on this issue?
Here is the code...
Here is the session info:
Here is the log, this time the script run through but the PCs results are NaNs: /restricted/projectnb/adgc/zhucc/ADGC_data/pheno/PCs/script_53k_twoStepsPCS/adgc.pc-air.pcs.txt