zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

seqMerge error - File 2: chromosomes and positions are unsorted. #93

Open boboppie opened 4 months ago

boboppie commented 4 months ago

Dear @zhengxwen / @zhengxw-ab ,

Hope you are keeping well.

I was trying to merge two samples: seqMerge(c("sample_1.gds", "sample_2.gds"), "merged.gds", storage.option="LZMA_RA", verbose=TRUE))

error:

Error in seqMerge(c(""sample_1.gds", "sample_2.gds"), : File 2: chromosomes and positions are unsorted.`

Some additional info:

f1 <- seqOpen("sample_1.gds") f2 <- seqOpen("sample_2.gds") head(seqGetData(f, "$chrom_pos")) [1] "1:10230" "1:10247" "1:10329" "1:10352" "1:10469" "1:10519" head(seqGetData(f2, "$chrom_pos")) [1] "1:10230" "1:10329" "1:10352" "1:10745" "1:10774" "1:14455"

This looks similar to https://github.com/zhengxwen/SeqArray/issues/41, but I've made sure to sort the VCFs first using bcftools sort. Those two samples do not share the exact same variants, should seqMerge work in this case?

Many thanks in advance.

Best wishes, Fengyuan

zhengxwen commented 3 months ago

seqMerge() needs the same samples or the same variants to merge the files.

Why not use bcftools to merge the VCF files?

zhengxwen commented 3 months ago

SeqArray_1.44.1 (https://github.com/zhengxwen/SeqArray/commit/702488b6b98156998e07b34b49cfb3585d14e370) should fix this question.