secastel / phaser

phasing and Allele Specific Expression from RNA-seq
GNU General Public License v3.0
111 stars 36 forks source link

Question on out_prefix.vcf.gz #39

Closed JohnMCMa closed 5 years ago

JohnMCMa commented 7 years ago

Hi Stephene,

I looked at the structure of the out_prefix.vcf.gz output and I'd like you to enlighten me on a few things:

  1. Do blacklisted variants within a haplotype (i.e. variants listed as variantsBlacklisted in haplotypic_counts.txt) get their, say, PB field populated in the vcf?
  2. Does the PI field in the vcf have anything to do with the line index in haplotypic_counts.txt?

Thanks for your answer in advance! Cheers, John Ma Department of Lymphoma/Myeloma UT MD Anderson Cancer Center

secastel commented 7 years ago

Hi John,

If variants are blacklisted using "--blacklist" they will not have any of the phASER tags filled, as they are completely excluded during the analysis. If they were blacklisted with "--haplo_count_blacklist" then they will only be excluded from out_prefix.haplotypes.txt, but will have their phasing details including in out_prefix.vcf.gz.

As for PI, looking at the code I think the PI field should be equal to the line number in the haplotypic_counts.txt provided you do not specify "--min_cov", which would filter out lines and cause them to become out of sync. This is not guaranteed though, so I would check manually for a few examples to be sure. It is possible that I could add the haplotype block index (PI) to the haplotypic_counts and haplotypes outputs. Would this be useful?

Stephane