Closed tderrien closed 6 years ago
Hi tderrien,
I have not seen VCF data like this before. Could you share what software you used to create it? I'm not saying there's anything wrong with it, just curious.
I'm not sure I follow your question. So let's try to create a minimal reproducible example. And I'm not sure I understand where the 'SVTYPE' data came from, perhaps the INFO column? If we work on the vcfR object the @fix matrix and the @gt matrix will have the same number of rows. So perhaps that would be a good place to start.
library(vcfR)
data("vcfR_test")
# Fabricate an example
vcfR_test@fix[,"INFO"] <- paste(getINFO(vcfR_test), paste("SVTYPE", c("DUP", "DEL", "DUP", "INV", "DUP"), sep = "="), sep = ";")
# Isolate duplicattions
vcfR_test <- vcfR_test[grep("SVTYPE=DUP", getINFO(vcfR_test))]
# Get the genotypes
gt <- extract.gt(vcfR_test)
# Add a homozygous variant for the example
gt[2,] <- "0/0"
gt <- is.het(gt)
# Isolate heterozygous variants
vcf <- vcfR_test[rowSums(gt) > 0,]
vcf
vcft <- vcfR2tidy(vcf)
Note that this doesn't really look like your example. I have no 'SVTYPE' in the fix list. So I'm missing some information. Could you take a look at this example and see if it helps? If not, could you try to help us get an example that matches your data better?
Thanks! Brian
Hi Brian,
Thank you for your prompt reply!
Actually, the .vcf has been produced by a Structural Variant (SV) caller named Delly (version v0.7.8).
The SVType information comes from the vcft$fix$SVTYPE
field while the genotype is in the vcft$gt
so I got some issues combining both.
If I understand correctly, you recommend to first work/filter on vcfR object rather than filter on the converted "vcfR tidy" object.
In any case, here is a link to get an example of the .vcf.
Thank you again!
Hi tderrien,
I've heard of delly, just never used it. Maybe I should?
Thanks for sharing your data. After inspecting it I believe we were very close. Although I appear to have missed a critical comma in the line containing the grep statement. I believe the following should get you what you need.
library(vcfR)
vcf <- read.vcfR("CL100086620_CL100090508.merged.vcf.gz")
# Isolate duplications
vcf <- vcf[grep("SVTYPE=DUP", getINFO(vcf)),]
# Get the genotypes
gt <- extract.gt(vcf)
# Add a homozygous variant for the example
gt <- is.het(gt)
# Isolate heterozygous variants
vcf <- vcf[rowSums(gt) > 0,]
vcf
vcft <- vcfR2tidy(vcf)
Is that what you were looking for?
Hi Brian,
Yes, that's great! Thank you!
Hello,
First of all, thank you for providing such a great package!
My question is probably naive but I cannot find the answer on the tutorial or previous issues. I have a vcf object that I've converted into 'tibble' called
vcft
using the dedicatedvcfR2tidy()
function (I provide an example of thevcft
at the end).My question is how can I filter variants on both the
$fix
and$gt
lists e.g. finding variants that are both a duplication and heterozygous for the 2 individuals (fix
listSVTYPE == "DUP"
andgt
listgt_GT="0/1"
)?I read on the tutorial :
Note that the fix and gt elements have a ‘ChromKey’ to help coordinate the variants in both structures
but I'm not sure how to use this ChromKey.Thank you for your help,
Best,
Thomas
Here is an overview of the
vcft
object: