knausb / vcfR

Tools to work with variant call format files
246 stars 55 forks source link

Combine Two Variants In One Single VCF File #187

Closed gk7279 closed 3 years ago

gk7279 commented 3 years ago

Hi vcfR team-

Is there a way that your R package can merge multiple rows/variants in a given vcf file? Similar to bedtools merge.

I am also looking for a tool that can give me the full REF & ALT sequences if a region, vcf and bam file are given.

Please advice. Thanks.

knausb commented 3 years ago

Hi @gk7279

You can use rbind2() to combine rows/variants. Note that many downstream analyses will expect you to sort by chromosome and position within each chromosome. Here, you need to handle this.

The REF and the ALT allele(s) are reported in the VCf so I do not think you need a *.bam.

Let me know if that helps!

library(vcfR)
#> 
#>    *****       ***   vcfR   ***       *****
#>    This is vcfR 1.12.0 
#>      browseVignettes('vcfR') # Documentation
#>      citation('vcfR') # Citation
#>    *****       *****      *****       *****
data("vcfR_test")
vcfR_test
#> ***** Object of Class vcfR *****
#> 3 samples
#> 1 CHROMs
#> 5 variants
#> Object size: 0 Mb
#> 0 percent missing data
#> *****        *****         *****

vcf2 <- rbind2(vcfR_test, vcfR_test)
vcf2
#> ***** Object of Class vcfR *****
#> 3 samples
#> 1 CHROMs
#> 10 variants
#> Object size: 0 Mb
#> 0 percent missing data
#> *****        *****         *****

getREF(vcf2)
#>  [1] "G"   "T"   "A"   "T"   "GTC" "G"   "T"   "A"   "T"   "GTC"
getALT(vcf2)
#>  [1] "A"      "A"      "G,T"    NA       "G,GTCT" "A"      "A"      "G,T"   
#>  [9] NA       "G,GTCT"

Created on 2021-06-21 by the reprex package (v1.0.0)

gk7279 commented 3 years ago

Thanks, it worked.