secastel / phaser

phasing and Allele Specific Expression from RNA-seq
GNU General Public License v3.0
111 stars 36 forks source link

Allele specific expression on multiple samples #42

Open mbosio85 opened 6 years ago

mbosio85 commented 6 years ago

Hello,

I ran phASER for multiple samples and extracted allele specific expression for each sample independently. Seen that these samples are split into case/control, I would like to process these expression data to see if there is anything interesting. My question is if the allele specific expression data from multiple samples are directly compatible or not. Can I compare aCount and bCount across multiple samples straight away or is there a risk that what is measured on aCount for sample X, ends up in bCount for sample Y ?

Do you have a suggested protocol for this task?

Thanks a lot

Mattia

secastel commented 6 years ago

Hi Mattia, Specific questions about best practices for analysis of ASE data is a bit outside of the scope of phASER. It is primarily a data generation tool - after that it is up to the user to decide what the best analysis for their specific question of interest is. There is no one general protocol for the analysis of ASE data. As a role of thumb though, you should not be comparing the counts across samples, as these are determined by read depth, but instead, you should compare something like the allelic fold change (log2_aFC outputted by phaser_gene_ae), which is the log transformed ratio of aCount over bCount. How you carry out and interpret this comparison is up to you. For example, you might find certain genes that have increased aFC in cases versus controls. This could indicate an increase in strong regulatory effects that might be involved in disease risk.

I would strongly suggest reading our paper on the analysis of ASE data here:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0762-6

Hope this helps.

Stephane

everestial commented 6 years ago

HI, @mbosio85

I have used pHASER extensively and can tell from experience that aCount or Haplotype for sample X may turn up as bCount or Haplotype for sample Y. I am doing ASE analyses in F1 hybrids and I rather need a proper haplotype configuration (right phase state connected from earlier haplotype with next haplotype). PHASER has been execellent in creating local haplotype blocks, but I had write my own parser to update my GW haplotype.

Based on your question it looks like you are interested in figuring out which haplotypes are same/similar across different sample. Now, I am almost finishing another python tool, that can test the allele-genotype information across several samples for the overlapping haplotype blocks and then assign and extend that block. May be that will interest you.

Let me know if you have any questions.

smoenga55 commented 5 years ago

Hi @everestial
Did you finish this tool?

everestial commented 5 years ago

@smoenga55 The tools are phase stitcher https://github.com/everestial/phase-Extender

and phase extender https://github.com/everestial/phase-Extender

The way haplotype phase extension is done depends on the assumptions of relationships between samples, so make sure it is clear.