privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Error in if (cost > max_cost) return(NULL) #455

Closed AlisaBIG closed 8 months ago

AlisaBIG commented 8 months ago

Hi, Dr. privefl. Thank you for your excellent job. I tried to replicate the LD block results of hapmap3 plus that you provided. I just extracted chr1 as an example, with 1000 Genome EUR as the reference panel. But there are some errors. I failed to figure out this problem. Here are my codes below. Do you have any ideas? By the way, could you please share the codes that you split blocks for hapmap3 variants for us? I really appreciate any help you can provide.

if (file.exists("chr1.rds")==FALSE){
  rds1 <- snp_readBed("chr1.bed")
}

test1 <- snp_attach("chr1.rds")

G <- test1$genotypes
corr <- snp_cor(G,
  ind.row = rows_along(G),
  ind.col = cols_along(G))

THR_R2 <- 0.01
m <- ncol(corr)
SEQ <- round(seq_log(m / 30, m / 5, length.out = 10))
res <- snp_ldsplit(corr, thr_r2 = THR_R2, min_size = 100, max_size = SEQ)

Error reports:

Error in if (cost > max_cost) return(NULL) : 
  missing value where TRUE/FALSE needed
privefl commented 8 months ago

First, you need to provide information on the position and size of the window in snp_cor(), but that's another issue. Also, if you use corr only for snp_ldsplit, you can set thr_r2 directly in snp_cor().

As for the actual issue here, do you have any missing values in corr? You should have gotten a warning about it. Otherwise you can just check anyNA(corr@x).

AlisaBIG commented 8 months ago

Yep... There is NA in the matrix.

> anyNA(corr@x)
[1] TRUE

Also, if I calculate LD by

corr <- snp_cor(
            G,
            ind.col = ind.chr2,
            ncores = NCORES,
            infos.pos = POS2[ind.chr2],
            size = 3 / 1000
        )

Still NA there.

I have taken an overlap with SNPs I need to calculate the LD matrix with 1kg reference panel. So why does NA still occur?

AlisaBIG commented 8 months ago

Can I replace the "NA" of corr@x with zero?

privefl commented 8 months ago

For this particular use-case, you probably could just replace missing values with 0s. But maybe you should just do some filtering on small MAFs beforehand.

AlisaBIG commented 8 months ago

Yes, it successfully worked when I replaced NA with 0. But what if I don't want to filter all the SNPs because I want to reuse the blocks the next time I have different GWAS sumstats? Just as ldetect did. It provided a block division for us about the start and end of each block.

privefl commented 8 months ago

Having 0s for this variant means that it is independent from all other variants, and can be assigned to any block. If there is no block split at such variants, you're fine I guess.