Open AkshajD opened 1 year ago
Hi,
AC field is required because it is the Allele Count field, which is used to compute allele frequency and keep only common variants above the selected threshold.
If your input data does not have this AC field, you can compute it using the bcftools fill-tags plugin (https://samtools.github.io/bcftools/howtos/plugin.fill-tags.html)
Best, Robin
Notice that +fill-tags
can be up to 10x slower than +fill-AN-AC
, so the latter is recommended. You can also use bcftools view -c 0
to fill the AN and AC fields though this recomputes AN and AC only if they are not already present. Why is it that you have to have AN and AC though? Couldn't SHAPEIT5 use the bcf_calc_ac() function from htslib/htslib/vcfutils.h to compute AN and AC? This is a function that would allow to use existing AN and AC values if present and compute them from genotypes otherwise:
/**
* bcf_calc_ac() - calculate the number of REF and ALT alleles
* @header: for access to BCF_DT_ID dictionary
* @line: VCF line obtained from vcf_parse1
* @ac: array of length line->n_allele
* @which: determine if INFO/AN,AC and indv fields be used
*
* Returns 1 if the call succeeded, or 0 if the value could not
* be determined.
*
* The value of @which determines if existing INFO/AC,AN can be
* used (BCF_UN_INFO) and and if indv fields can be split (BCF_UN_FMT).
*/
HTSLIB_EXPORT
int bcf_calc_ac(const bcf_hdr_t *header, bcf1_t *line, int *ac, int which);
And this is how the function is used in bcftools/plugins/fill-AN-AC.c:
bcf1_t *process(bcf1_t *rec)
{
hts_expand(int,rec->n_allele,marr,arr);
int ret = bcf_calc_ac(in_hdr,rec,arr,BCF_UN_FMT);
if ( ret>0 )
{
int i, an = 0;
for (i=0; i<rec->n_allele; i++) an += arr[i];
bcf_update_info_int32(out_hdr, rec, "AN", &an, 1);
bcf_update_info_int32(out_hdr, rec, "AC", arr+1, rec->n_allele-1);
}
return rec;
}
I am relatively new to bioinformatics. I would like to do haplotype phasing using phase_common (shapeit5) before carrying out genotype imputation using Beagle 5.4 but I am getting an ERROR: AC field is needed in file. The code I ran is shown below: `[SHAPEIT5] phase_common (jointly phase multiple common markers)
Author : Olivier DELANEAU, University of Lausanne
Contact : olivier.delaneau@gmail.com
Run date : 17/09/2023 - 11:50:19
Files:
Input : [/home/mubita/bioinformatics/input_shapeit/pk_hg19_chr6_tags.bcf]
Reference : [/home/mubita/bioinformatics/input_shapeit/chr6.1kg.phase3.v5a.bcf.gz]
Genetic Map : [/home/mubita/bioinformatics/input_shapeit/plink.chr6.GRCh37.map.txt]
Output : [/home/mubita/bioinformatics/ouput_shapeit/pk_hg19_chr6_tags.phased.bcf]
Output format : [bcf]
Parameters:
Seed : 15052011
Threads : 8 threads
MCMC : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
PBWT : [window = 4cM / depth = auto / modulo = auto / mac = 5 / missing = 0.1]
HMM : [window = 4cM / Ne = 15000 / Recombination rates given by genetic map]
Reading genotype data: ERROR: AC field is needed in file`
I do not understand why this error happened because I filled tags to my input file using the code:
bcftools +fill-tags pk_hg19_chr6.bcf -Ob -o pk_hg19_chr6_tags.bcf -- -t all
The header of the vcf file has all the necessary tags e.g.
In addition, an excerpt of the vcf file before filling the tags looks like this:
6 160557643 rs2282143 C T . PASS AA=C;AC=33;AF=0.1138;AN=290;NS=145 GT 0/1 0/1 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/1 0/0 0/0 0/1 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/1 0/0 0/0 0/0 0/0 0/1 0/0 0/1 0/0 0/0 1/1 0/0 0/1 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/1 ./. 0/1 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0
6 160560881 rs72552763 ATGA A . PASS AA=ATGA;AC=9;AF=0.03061;AN=294;NS=147 GT 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/1 0/1 0/0 0/0 0/0
Given this information, how can I get rid of this error so that shapeit (phase_common) runs successfully?
@mubita3 Not sure if your issue is resolved, but for those still looking for an answer, both your input and reference panel files need the AC field.
I am running phase_common on a BCF file (as well as variants to exclude in another BCF) acquired from an Axiom array.
While running, it gives an error stating that
AC field is needed in file
for each of the chromosomes. A sample of this is included below.Since I am still fairly new to genotyping pipelines, I am unfamiliar with what the AC field is and how I can make sure it is present in my file (as well as which file (aka the input BCF or one of the references/genetic maps) needs it).
Would appreciate any advice, thank you!