zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
44 stars 12 forks source link

seqSetFilter(, variant.sel=..., ret.idx=TRUE) #80

Closed zhengxwen closed 2 years ago

zhengxwen commented 2 years ago

ret.idx was introduced in SeqArray_v1.32.0 for unsorted sample and variant indices. Users can use the return of seqSetFilter(, variant.sel=..., ret.idx=TRUE) to rearrange the data obtained from seqGetData() according to the input sample or variant selection.

library(SeqArray)

f <- seqOpen(seqExampleFileName("gds"))
head(seqGetData(f, "sample.id"))
## [1] "NA06984" "NA06985" "NA06986" "NA06989" "NA06994" "NA07000"
head(seqGetData(f, "variant.id"))
## [1] 1 2 3 4 5 6

samp_sel_id <- c("NA06994", "NA06986", "NA06989")
(s_i <- seqSetFilter(f, sample.id=samp_sel_id, ret.idx=TRUE))
## $sample_idx
## [1] 3 1 2

seqGetData(f, "sample.id")
## [1] "NA06986" "NA06989" "NA06994"    <- not the same as samp_sel_id
seqGetData(f, "sample.id")[s_i$sample_idx]
## "NA06994" "NA06986" "NA06989"    <- the same as samp_sel_id

variant_sel_id <- c(6, 3, 2)
(v_i <- seqSetFilter(f, variant.id=variant_sel_id, ret.idx=TRUE))
## $variant_idx
## [1] 3 2 1

seqGetData(f, "variant.id")
## [1] 2 3 6    <- not the same as variant_sel_id
seqGetData(f, "variant.id")[v_i$variant_idx]
## [1] 6 3 2    <- the same as variant_sel_id

But there is a bug when sample.sel and variant.sel are used with the indices.

variant_sel_idx <- c(6, 2, 3)  # index instead of ID
(v_i <- seqSetFilter(f, variant.sel=variant_sel_idx, ret.idx=TRUE))
## $variant_idx
## [1] 2 3 1        # should be 3 1 2

seqGetData(f, "variant.id")
## [1] 2 3 6    <- not the same as variant_sel_id
seqGetData(f, "variant.id")[v_i$variant_idx]
## [1] 3 6 2    <- not the same as variant_sel_id

It is fixed in SeqArray_v1.36.1 and v1.37.1.

zhengxwen commented 2 years ago

It is fixed in SeqArray_1.36.1.