zhengxwen / SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)
http://www.bioconductor.org/packages/SNPRelate
98 stars 25 forks source link

bug in snpgdsLDpruning when slide.max.bp too high #1

Closed kforner closed 10 years ago

kforner commented 10 years ago

I recently used big values with snpgdsLDpruning just to be sure it used my slide.max.n parameter to define the window, and using too high values give wrong results:

library(SNPRelate)
snpgds <- openfn.gds(snpgdsExampleFileName())

res1 <- snpgdsLDpruning(snpgds,
  slide.max.bp = 1e9,
  slide.max.n = 20000,
  ld.threshold = 0,
  remove.monosnp = TRUE,
  autosome.only = FALSE,
  method = 'r', num.thread = 1,
  verbose = TRUE)

res2 <- snpgdsLDpruning(snpgds,
  slide.max.bp = 1e10,
  slide.max.n = 20000,
  ld.threshold = 0,
  remove.monosnp = TRUE,
  autosome.only = FALSE,
  method = 'r', num.thread = 1,
  verbose = TRUE)

The first call selects one SNP per chromosome as expected(ld.threshold = 0), but with slide.max.bp = 1e10 it suddenly selects all the SNPs.

zhengxwen commented 10 years ago

In the source code of snpgdsLDpruning in SNPRelate_v0.9.19:

rv <- .C("gnrLDpruning", as.integer(startidx-1), position[flag],
    as.integer(slide.max.bp), as.integer(slide.max.n), as.double(ld.threshold),
    method, out_snp = logical(node$n.snp),
    err = integer(1), NAOK=TRUE, PACKAGE="SNPRelate")

slide.max.bp is treated as an integer, and actually as.integer(1e10) is NA. I will fix it in the new version.