Closed AAvalos82 closed 7 years ago
The inversion is occurring at the seqSNP2GDS step.
A new option major.ref
is added to `seqSNP2GDS:
seqSNP2GDS(gds.fn, out.fn, storage.option="LZMA_RA", major.ref=TRUE,
optimize=TRUE, digest=TRUE, verbose=TRUE)
if major.ref=TRUE
, use the major allele as a reference allele.
Likely a minor bug, but the seqGDS2VCF function is effectively inverting the genotype when it writes out a VCF.
The function uses the allele coding:
2 = two copies of A/REF, 1= A/B, 0= two copies of B/ALT, when producing a VCF file.
Thus the VCF output becomes:
1/1 for the REF in the GT field and 0/0 for the ALT in the GT field of the VCF.
However, the VCF format treats 0 = REF and 1 = ALT (see "1.4.2 Genotype fields" section in https://samtools.github.io/hts-specs/VCFv4.2.pdf, and also http://www.internationalgenome.org/wiki/Analysis/vcf4.0/). As per this format a REF homozygous should be 0/0 and ALT homozygous should be 1/1 in the VCF.
This may be specific to my case, where I converted a SNP GDS file to a Seq GDS file first using the seqSNP2GDS, then wrote the Seq GDS into a VCF. But if that is the case then the inversion is occurring at the seqSNP2GDS step.