zhengxwen / SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)
http://www.bioconductor.org/packages/SNPRelate
101 stars 25 forks source link

LD-threshold >1 ? #52

Closed kroluk closed 6 years ago

kroluk commented 6 years ago

Hi,

I want to prune my SNP data based on maf, %missing and LD using snpgdsLDpruning(). Using ld.threshold= 0.2 removed a great many markers, so I started to play around. I found that if I dont want to remove any SNP based on LD, I have to set ld.threshold = 1.1. I find this strange since in theory LD >1 is not possible - or am I missing something here?

Thanks, Lukas

R output:

(GDSgenofile <- snpgdsOpen("data/geno.gds")) File: data\Geno.gds (1.2M)

  • [ ] |--+ sample.id { Str8 315 ZIP_ra(21.8%), 624B } |--+ snp.id { Str8 13621 ZIP_ra(34.7%), 84.9K } |--+ snp.position { Int32 13621 ZIP_ra(93.6%), 49.8K } |--+ snp.chromosome { Int32 13621 ZIP_ra(0.26%), 147B } --+ genotype { Bit2 315x13621, 1.0M }

snpset <- snpgdsLDpruning(GDSgenofile, ld.threshold= 0.2, maf = 0.05, missing.rate = 0.05) SNP pruning based on LD: Excluding 0 SNP on non-autosomes Excluding 171 SNPs (monomorphic: TRUE, MAF: 0.05, missing rate: 0.05) Working space: 315 samples, 13,450 SNPs using 1 (CPU) core sliding window: 500,000 basepairs, Inf SNPs |LD| threshold: 0.2 method: composite Chromosome 1: 42.26%, 363/859 Chromosome 2: 43.45%, 504/1,160 Chromosome 3: 48.33%, 246/509 Chromosome 4: 44.21%, 370/837 Chromosome 5: 43.67%, 397/909 Chromosome 6: 61.33%, 203/331 Chromosome 7: 39.18%, 286/730 Chromosome 8: 44.75%, 413/923 Chromosome 9: 69.59%, 135/194 Chromosome 10: 43.81%, 184/420 Chromosome 11: 44.53%, 220/494 Chromosome 12: 72.04%, 67/93 Chromosome 13: 38.42%, 345/898 Chromosome 14: 36.57%, 396/1,083 Chromosome 15: 49.71%, 173/348 Chromosome 16: 35.01%, 286/817 Chromosome 17: 44.01%, 419/952 Chromosome 18: 57.19%, 163/285 Chromosome 19: 43.42%, 396/912 Chromosome 20: 40.96%, 265/647 Chromosome 21: 70.91%, 156/220 5,987 markers are selected in total.

snpset <- snpgdsLDpruning(GDSgenofile, ld.threshold= 1.0, maf = 0.05, missing.rate = 0.05) SNP pruning based on LD: Excluding 0 SNP on non-autosomes Excluding 171 SNPs (monomorphic: TRUE, MAF: 0.05, missing rate: 0.05) Working space: 315 samples, 13,450 SNPs using 1 (CPU) core sliding window: 500,000 basepairs, Inf SNPs |LD| threshold: 1 method: composite Chromosome 1: 87.54%, 752/859 Chromosome 2: 87.93%, 1,020/1,160 Chromosome 3: 86.05%, 438/509 Chromosome 4: 85.54%, 716/837 Chromosome 5: 88.56%, 805/909 Chromosome 6: 91.84%, 304/331 Chromosome 7: 88.08%, 643/730 Chromosome 8: 90.68%, 837/923 Chromosome 9: 93.30%, 181/194 Chromosome 10: 90.00%, 378/420 Chromosome 11: 85.22%, 421/494 Chromosome 12: 89.25%, 83/93 Chromosome 13: 84.41%, 758/898 Chromosome 14: 83.19%, 901/1,083 Chromosome 15: 92.53%, 322/348 Chromosome 16: 87.39%, 714/817 Chromosome 17: 88.45%, 842/952 Chromosome 18: 92.63%, 264/285 Chromosome 19: 88.71%, 809/912 Chromosome 20: 88.25%, 571/647 Chromosome 21: 92.27%, 203/220 11,962 markers are selected in total.

snpset <- snpgdsLDpruning(GDSgenofile, ld.threshold= 1.1, maf = 0.05, missing.rate = 0.05) SNP pruning based on LD: Excluding 0 SNP on non-autosomes Excluding 171 SNPs (monomorphic: TRUE, MAF: 0.05, missing rate: 0.05) Working space: 315 samples, 13,450 SNPs using 1 (CPU) core sliding window: 500,000 basepairs, Inf SNPs |LD| threshold: 1.1 method: composite Chromosome 1: 98.84%, 849/859 Chromosome 2: 98.97%, 1,148/1,160 Chromosome 3: 98.23%, 500/509 Chromosome 4: 99.16%, 830/837 Chromosome 5: 99.34%, 903/909 Chromosome 6: 99.40%, 329/331 Chromosome 7: 99.45%, 726/730 Chromosome 8: 99.78%, 921/923 Chromosome 9: 98.97%, 192/194 Chromosome 10: 96.90%, 407/420 Chromosome 11: 97.98%, 484/494 Chromosome 12: 97.85%, 91/93 Chromosome 13: 98.00%, 880/898 Chromosome 14: 99.08%, 1,073/1,083 Chromosome 15: 99.43%, 346/348 Chromosome 16: 97.06%, 793/817 Chromosome 17: 99.16%, 944/952 Chromosome 18: 99.65%, 284/285 Chromosome 19: 98.36%, 897/912 Chromosome 20: 98.45%, 637/647 Chromosome 21: 98.18%, 216/220 13,450 markers are selected in total.

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 LC_MONETARY=German_Switzerland.1252 [4] LC_NUMERIC=C LC_TIME=German_Switzerland.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SNPRelate_1.16.0 gdsfmt_1.18.0

loaded via a namespace (and not attached): [1] compiler_3.5.1 tools_3.5.1 rstudioapi_0.8 yaml_2.2.0 crayon_1.3.4

zhengxwen commented 6 years ago

Could you please show me the result of

snpset <- snpgdsLDpruning(GDSgenofile, method="corr", ld.threshold= 1.0,
    maf = 0.05, missing.rate = 0.05)

Numerical calculation could not guarantee the maximum is exactly 1 (, but should be very close to 1).

kroluk commented 6 years ago

Hi Xiuwen, thanks for the quick answer. Pleas find the result below.

Setting the threshold to 1.0 still prunes approx. 450 markers. The "border-threshold" is at about 1.000000000000001. So indeed very close to 1.

snpset <- snpgdsLDpruning(GDSgenofile, method="corr", ld.threshold= 1.0, maf = 0.05, missing.rate = 0.05) SNP pruning based on LD: Excluding 0 SNP on non-autosomes Excluding 171 SNPs (monomorphic: TRUE, MAF: 0.05, missing rate: 0.05) Working space: 315 samples, 13,450 SNPs using 1 (CPU) core sliding window: 500,000 basepairs, Inf SNPs |LD| threshold: 1 method: correlation Chromosome 1: 95.46%, 820/859 Chromosome 2: 94.48%, 1,096/1,160 Chromosome 3: 92.53%, 471/509 Chromosome 4: 97.49%, 816/837 Chromosome 5: 97.80%, 889/909 Chromosome 6: 97.58%, 323/331 Chromosome 7: 94.66%, 691/730 Chromosome 8: 96.64%, 892/923 Chromosome 9: 98.45%, 191/194 Chromosome 10: 94.29%, 396/420 Chromosome 11: 95.14%, 470/494 Chromosome 12: 97.85%, 91/93 Chromosome 13: 94.54%, 849/898 Chromosome 14: 92.61%, 1,003/1,083 Chromosome 15: 97.13%, 338/348 Chromosome 16: 94.37%, 771/817 Chromosome 17: 95.69%, 911/952 Chromosome 18: 96.84%, 276/285 Chromosome 19: 95.94%, 875/912 Chromosome 20: 95.83%, 620/647 Chromosome 21: 96.82%, 213/220 13,002 markers are selected in total.