Closed freerkvandijk closed 5 years ago
Hi Freerk, Thanks for the very through explanation of your issue. Your assessment is correct - the *.allelic_counts file does include counts from all libraries that were used as inputs, even those that were excluded using the --haplo_count_bam_exclude option. As of now this is the intended behavior for the tool - although I can see why one would also want to exclude any reads in the haplo_count_bam_exclude bams, however in it's current implementation this would be difficult. If you are just looking for basic allelic counts, I would suggest trying out allelecounter, since this doesn't require any phasing, and is a much simpler problem. Allelecounter is very fast, and easy to run.
Also, not sure what version of phASER you are running, but I'd suggest updating to the latest release (v1.0.0) if you haven't as there are quite a few improvements in haplotype level ASE data generation.
Stephane
Hello! I was reading this issue and it is not clear to me what is going on. I have a VCF file and I want to phase it using both DNA and RNA-Seq data (2 bam files). Once it is phased, I would like to know the ASE using only the RNA-Seq bam file. Is this possible?
Another question is, is it possible to convert read counts to TPM? Or this does not make any sense at all? (I'm a beginner in ASE, sorry about that :) )
Thank you in advance,
To generate only ASE counts using the RNA-seq bam file use the argument: "--haplo_count_bam_exclude N" where N is the index of the provided bam you want to exclude from ASE counts.
for example: "phaser --bam dna.bam,rna.bam --haplo_count_bam_exclude 1"
RE ratios - the values do not need to be normalized by depth because they are ratios of ref/alt counts. If you are new to ASE analysis I suggest you read this paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0762-6
Hello again,
Thank you for your fast and comprehensive answer. Just one more "easy" (probably not) fast question. If I have a set of somatic variants and I want to calculate mutant transcript specific expression, can I do it using phaser
?
In your case you might be better to look at the expression of individual somatic variants rather than an approach which tries to calculate transcript level allelic expression (as phASER does). In this case you would measure ASE across all variants, and then look specifically at somatic variants to test whatever hypothesis you're interested in. You can use a tool like the GATK ASEReadCounter (https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_rnaseq_ASEReadCounter.php) or my much more basic allelecounter (https://github.com/secastel/allelecounter). One very strong word of caution though, when analyzing somatic variants, there are often copy number changes, which can cause false positive signals of allelic imbalance, and need to be controlled for.
Dear Stephane,
I used the phASER software using two input BAM files, one containing RNA-seq data, the other containing Whole Genome Sequencing DNA data. I used the --haplo_count_bam_exclude option to exclude the WGS BAM data from my output data.
The number of reads overlapping for example chr1, position 100547994, in my both BAM files are:
I noticed that the *.allelic_counts file contains the summed counts from both RNA and WGS data:
While in the *.haplotypic_counts data just 22 reads for this same SNP are used:
Is this intended behaviour for *.allelic_counts output when using a BAM file containing DNA data and the --halpo_count_bam_exclude option?
Regards,
Freerk