Closed thierrygosselin closed 4 years ago
The error:
Error in seqVCF2GDS(fn, "t.gds") :
FORMAT ID 'AD' (Number=R) should have 2 value(s), but receives 4.
FILE: /Users/zhengxx1/Downloads/thierry_gosselin_20200127_freebayes.vcf
LINE: 75, COLUMN: 10, 0/0:12:12,0,0,0:12:432:0,0,0:0,0,0:0,-3.61236,-39.2188,-3.61236,-39.2188,-39.2188,-3.61236,-39.2188,-39.2188,-39.2188
You should modify the vcf file and change ID=AD,Number=R
to ID=AD,Number=.
.
Please also modify other lines of header of VCF, to allow variable-length data.
Thanks for the quick answer!
Or revise the VCF header in R:
library(SeqArray)
fn <- "thierry_gosselin_20200127_freebayes.vcf.gz"
hdr <- seqVCF_Header(fn)
hdr$format$Number[hdr$format$ID=="AD"] <- "."
hdr$format$Number[hdr$format$ID=="AO"] <- "."
hdr$format$Number[hdr$format$ID=="QA"] <- "."
hdr$format$Number[hdr$format$ID=="GL"] <- "."
seqVCF2GDS(fn, "t.gds", header=hdr)
Yes I usually do those.
But this type of VCF, where it's supposed to be haplotypes, but only showing just 2 alleles and 4 AD values...
Dear Xiuwen,
The original data is a VCF file generated by the RADseq pipeline called dDocent. The import with SeqArray is fine.
Problem: I'm having a problem with the GDS. The GT and DP length are fine, but it's the AD length that doesn't match.
Question: Could you tell be if it's a SeqArray bug during import or an upstream problem (dDocent and/or FreeBayes) ?
I'm sending you the VCF file by email.
Best Thierry