This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
I am trying to genotype a sample against a reference VCF file, including homozygous 0/0 calls.
To do this, I first generated a TARGETS file from the reference VCF file which contains the three columns "CHR\tPOS\tALLELES", with ALLELES = "REF,ALT". I then use mpileup with the --positions TARGETS argument to generate a pileup at all sites, followed by a call with --keep-alts --multiallelic-caller --constrain alleles --targets-file TARGETS to calculate genotype likelihoods for each site.
This seems to work in principle, however it simply crashes in the rare case that the ALT allele in the sample does not match the ALT allele in the reference data set. The error message I get in these cases is "Cannot constrain to A" (where A is the ALT allele).
This makes it essentially impossible for me to use bcftools because there is no proper way to avoid this error. The only solution I could think of at the moment would be to call genotypes first without constraining alleles, annotate each site with the expected alleles, compare the ALTs to the expected ALTs, find the sites where there is a mismatch, remove those from the original pileup and then rerun the caller, this time including the --constrain parameter. However, this would obviously be an incredibly complicated way of dealing with this.
Is there any way you could change the way mismatches between the expected and actual ALT alleles are handled?
I think it would be much better if bcftools simply printed a warning to STDERR and then ignored the site, instead of immediately exiting as soon as it encounters such a problem.
In theory I think this might be as simple as making these three changes:
Change mcall_constrain_alleles to return bool
Replace the call to error in mcall_constrain_alleles (linked above) by return false;
Change mcall_constrain_alleles(call, rec, &unseen) to if(!mcall_constrain_alleles(call, rec, &unseen)) { fprintf( stderr, "Skipping call because constraining alleles failed at %s:%d\n", bcf_seqname(call->hdr,rec),rec->pos+1); return 0; }
However, it would be great if someone more familiar with the code could have a look at this!
I am trying to genotype a sample against a reference VCF file, including homozygous 0/0 calls.
To do this, I first generated a TARGETS file from the reference VCF file which contains the three columns "CHR\tPOS\tALLELES", with ALLELES = "REF,ALT". I then use
mpileup
with the--positions TARGETS
argument to generate a pileup at all sites, followed by acall
with--keep-alts --multiallelic-caller --constrain alleles --targets-file TARGETS
to calculate genotype likelihoods for each site.This seems to work in principle, however it simply crashes in the rare case that the ALT allele in the sample does not match the ALT allele in the reference data set. The error message I get in these cases is "Cannot constrain to A" (where A is the ALT allele).
Looking at the code, the error is thrown by this line: https://github.com/samtools/bcftools/blob/3f13cfee86739a8904db15d69e12607600fd3f47/mcall.c#L1296
This makes it essentially impossible for me to use
bcftools
because there is no proper way to avoid this error. The only solution I could think of at the moment would be to call genotypes first without constraining alleles, annotate each site with the expected alleles, compare the ALTs to the expected ALTs, find the sites where there is a mismatch, remove those from the original pileup and then rerun the caller, this time including the--constrain
parameter. However, this would obviously be an incredibly complicated way of dealing with this.Is there any way you could change the way mismatches between the expected and actual ALT alleles are handled?
I think it would be much better if
bcftools
simply printed a warning to STDERR and then ignored the site, instead of immediately exiting as soon as it encounters such a problem.In theory I think this might be as simple as making these three changes:
mcall_constrain_alleles
to returnbool
error
inmcall_constrain_alleles
(linked above) byreturn false;
mcall_constrain_alleles(call, rec, &unseen)
toif(!mcall_constrain_alleles(call, rec, &unseen)) { fprintf( stderr, "Skipping call because constraining alleles failed at %s:%d\n", bcf_seqname(call->hdr,rec),rec->pos+1); return 0; }
However, it would be great if someone more familiar with the code could have a look at this!