griffithlab / VAtools

A set of tools to annotate VCF files with expression and readcount data
http://www.vatools.org
MIT License
25 stars 12 forks source link

Enhancement: Adding allele counts per strand in vcf-readcount-annotator #35

Closed JohnMCMa closed 1 year ago

JohnMCMa commented 4 years ago

The bam-readcount output contains allelic read counts on each strand; the readme file provides the following fields for each allele:

base:count:avg_mapping_quality:avg_basequality:avg_se_mapping_quality:num_plus_strand:num_minus_strand:avg_pos_as_fraction:avg_num_mismatches_as_fraction:avg_sum_mismatch_qualities:num_q2_containing_reads:avg_distance_to_q2_start_in_q2_reads:avg_clipped_length:avg_distance_to_effective_3p_end

Where num_plus_strand and num_minus_strand refers to read counts for allele base on the plus strand and minus strand respectively.

I wonder if these two fields can be added to vcf-readcount-annotator's output? For standardization's sake, you can put those data to FORMAT fields ADF and ADR like in the official VCF specs, like the following FORMAT header lines in bcftools:

##FORMAT=<ID=ADF,Number=R,Type=Integer,Description="Allelic depths on the forward strand (high-quality bases)">
##FORMAT=<ID=ADR,Number=R,Type=Integer,Description="Allelic depths on the reverse strand (high-quality bases)">

Many pipelines put filters on allele-specific depths, and the removal of the SAC annotation from GATK4 causes quite a bit of hardship.

Thanks in advance!

susannasiebert commented 4 years ago

Thanks for the suggestion. This sounds easy enough to implement. I will put it on my to-do list.

susannasiebert commented 1 year ago

It's taken us a while but this feature has now been implemented in 5.1.0.