secastel / phaser

phasing and Allele Specific Expression from RNA-seq
GNU General Public License v3.0
111 stars 36 forks source link

Get all possible haplotypes for a particular gene #38

Closed evigorito closed 5 years ago

evigorito commented 7 years ago

Hi Stephane,

Many thanks for sharing phaser, great tool! I was wondering if there is any way to get all possible haplotypes for a particular gene, when this is compatible with the input data.

secastel commented 7 years ago

I'm not exactly sure what you mean by all possible haplotype configurations? Do you mean all configurations possible for a given gene given e.g. there are 10 variants in that gene. If this is what you mean, I think it's a little outside the scope of phASER. However, I could look into adding a list of variants to the output of phaser_gene_ae, for example 1_78372_A_G;1_78375_G_T;1_78385_T_A etc.. that you could then use in a downstream script to generate all possible haplotypes from. Would that work? One important caveat would be that phASER would only report variants that were overlapped by at least one read and survived any of the blacklist filtering (if used) so it might not be a complete list of all variants in that gene...

evigorito commented 7 years ago

Sorry, I wasnt very clear. I was just thinking for genes where the evidence supporting one particular haplotype pair for a particular individual is not very strong if it would be possible to extract which are the competing possibilities. I understand this may not be within the scope of phaser.

secastel commented 6 years ago

Ah okay, I see what you mean now. You want to report all of the observed haplotype configurations of a gene. This wouldn't be possible at the gene level, because it links together haplotype blocks that themselves are not spanned by reads.

As far as phasing between individual variants, you would be able to look this up by referring to the output file out_prefix.variant_connections.txt. Any pair where supporting connections != total connections indicates that not all of the reads supported the same phase. As far as generating all possible haplotypes, because of the nature of how phASER does read-backed phasing, unfortunately this is not possible. When constructing the full haplotypes it first only uses variant connections with support for a single phase. See figure S1 of the phASER paper for more details (https://images.nature.com/original/nature-assets/ncomms/2016/160908/ncomms12817/extref/ncomms12817-s1.pdf).