zhanxw / seqminer

Query sequence data (VCF/BCF1/BCF2, Tabix, BGEN, PLINK) in R
http://zhanxw.github.io/seqminer/
Other
30 stars 12 forks source link

readBGENToMatrixByRange cannot found variants but not so after pruning by PLINK2 #15

Open garyzhubc opened 3 years ago

garyzhubc commented 3 years ago

Why is it that the dimension increased after pruning? In PLINK2 I did:

./plink2 --bgen ukb_imp_chr1_v3.bgen ref-first --sample ukb51283_imp_chr1_v3_s487283.sample --indep-pairwise 50 25 0.2 --rm-dup exclude-all --out ukb_imp_chr1_v3

./plink2 --bgen ukb_imp_chr1_v3.bgen ref-first --sample ukb51283_imp_chr1_v3_s487283.sample --extract ukb_imp_chr1_v3_pruned.prune.in --export bgen-1.2 bits=8 --out ukb_imp_chr1_v3_pruned

Then I use seqminer to check the dimension.

> library(seqminer)

> end_pos <- 1000000

> ukb_imp_chr1_v3.bgen <- "ukb_imp_chr1_v3.bgen"

> ukb_imp_chr1_v3_pruned.bgen <- "ukb_imp_chr1_v3_pruned.bgen"

> ukb_imp_chr1_v3.bgen <- readBGENToMatrixByRange(ukb_imp_chr1_v3.bgen, paste0("1:1-", format(end_pos, scientific=F)))

1 region to be extracted.

> ukb_imp_chr1_v3_pruned.bgen <- readBGENToMatrixByRange(ukb_imp_chr1_v3_pruned.bgen, paste0("1:1-", format(end_pos, scientific=F)))

1 region to be extracted.

> dim(ukb_imp_chr1_v3.bgen[[1]])

[1]      0 487409

> dim(ukb_imp_chr1_v3_pruned.bgen[[1]])

[1]   1008 487409

It means there wasn't any variant between POS 1 to 1000000 in the original file (which isn't the case) and now there are 1008 variants after pruning. This is very strange.

zhanxw commented 3 years ago

@garyzhubc What is the version of your PLINK2?

zhanxw commented 3 years ago

@dajiang Do we have this file: ukb_imp_chr1_v3.bgen ?

On Mon, Feb 8, 2021 at 12:30 PM Peiyuan Zhu notifications@github.com wrote:

Why is it that the dimension increased after pruning?

./plink2 --bgen ukb_imp_chr1_v3.bgen ref-first --sample ukb51283_imp_chr1_v3_s487283.sample --indep-pairwise 50 25 0.2 --rm-dup exclude-all --out ukb_imp_chr1_v3

./plink2 --bgen ukb_imp_chr1_v3.bgen ref-first --sample ukb51283_imp_chr1_v3_s487283.sample --extract ukb_imp_chr1_v3_pruned.prune.in --export bgen-1.2 bits=8 --out ukb_imp_chr1_v3_pruned

library(seqminer)

end_pos <- 1000000

ukb_imp_chr1_v3.bgen <- "ukb_imp_chr1_v3.bgen"

ukb_imp_chr1_v3_pruned.bgen <- "ukb_imp_chr1_v3_pruned.bgen"

ukb_imp_chr1_v3.bgen <- readBGENToMatrixByRange(ukb_imp_chr1_v3.bgen, paste0("1:1-", format(end_pos, scientific=F)))

1 region to be extracted.

ukb_imp_chr1_v3_pruned.bgen <- readBGENToMatrixByRange(ukb_imp_chr1_v3_pruned.bgen, paste0("1:1-", format(end_pos, scientific=F)))

1 region to be extracted.

dim(ukb_imp_chr1_v3.bgen[[1]])

[1] 0 487409

dim(ukb_imp_chr1_v3_pruned.bgen[[1]])

[1] 1008 487409

It means there wasn't any variant between POS 1 to 1000000 in the original file (which isn't the case) and now there are 1008 variants after pruning. This is very strange.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zhanxw/seqminer/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGRCDMRUUG2WXTH4AKNMLS6AUVPANCNFSM4XJPVNAA .

garyzhubc commented 3 years ago

My PLINK2 should be the latest stable version: alpha 2.3