Closed colindaven closed 1 year ago
Hi @colindaven, if I am getting your question, you are asking to put as a sample column the sample used in the VCF itself as the reference. In a VCF file format, variants are expressed with respect to a reference, so the reference genotypes of AT_Col-0_hq-v1.1.chr5
are always 0. Do you want a column with all zeros? Maybe I misunderstood your question.
Dear @AndreaGuarracino , thanks for your input.
I have now found SNPs where the reference is different to the two samples, which also display variation, eg using
grep -P "2\t1" *.vcf | less -S
or the reverse "1\t2".
Thanks for your assistance here, I now understand the subtle difference between a PGGB VCF and a multisample + ref traditional VCF.
Hi, I really, really like the VCF output here.
I guess the VCF comes from
VCF decompose
after looking through the tutorial, so should I rather post this question there?Anyway, I have 3 Arabidopsis public genomes and want to find variant positions (ideally SNPs) where all 3 are different. eg one has A, one has C, one has T. This is relevant for triploid experiments I'll be starting soon.
The last "genotype" columns with 1 and 0 seem to be useful for this, yet the reference is more implied here. Is there a way to force the reference genotype as an output column as well ? ie so 3 columns result, such as 0 1 0 , not just 0 1 as at present.
I am used to multisample SNP calling where you get one genotype for each BAM file, all versus the reference (eg with Freebayes).
Thanks, Colin