Closed mbhall88 closed 7 months ago
What is the motivation for the request?
There is currently no plan to support that, but it should not be difficult to write a script that takes the chain file produced with -c
and the fasta file and converts it to a VCF. If it's well written, I am happy to include it among bcftools/misc scripts.
The motivation is that I have applied some variants to a reference and then I want to call variants with respect to the other genome and need the "truth" VCF - i.e. the variants we expect to find.
It ended up being a reasonably easy script to write, as you say. Though I didn't use (or know about) the chain file. Is there somewhere that documents the chain file and its format?
The format is documented fore example here https://genome.ucsc.edu/goldenPath/help/chain.html
The script I came up with is essentially this python function
from collections import defaultdict
def inverse_vcf(invcf: str, outvcf: str) -> None:
"""This function takes a VCF that was used to generate a consensus sequence with bcftools
consensus and provides the inverse - i.e., the VCF that would turn the consensus
sequence back into the original reference sequence.
"""
with open(outvcf, "w") as f_out, gzip.open(invcf, "rt") as f_in:
offset_counter = defaultdict(int)
for line in f_in:
if line.startswith("#"):
print(line.strip(), file=f_out)
continue
fields = line.strip().split("\t")
ref = fields[3]
alt = fields[4]
pos = int(fields[1])
chrom = fields[0]
offset = offset_counter[chrom]
new_alt = ref
new_ref = alt
new_pos = pos + offset
record_offset = len(alt) - len(ref)
offset_counter[chrom] += record_offset
fields[1] = str(new_pos)
fields[3] = new_ref
fields[4] = new_alt
print("\t".join(fields), file=f_out)
Hi,
I have applied the variants from a VCF to a reference with
bcftools consensus
and was wondering if you know of a way to get the "inverse VCF" - i.e. the VCF that would return the consensus sequence back to the original reference? The REF and ALT would be swapped, but the POS will obviously change.It would nice to have this as an optional output to
bcftools consensus
if that is within scope?