secastel / phaser

phasing and Allele Specific Expression from RNA-seq
GNU General Public License v3.0
105 stars 36 forks source link

Loss of AD/DP info after phasing #40

Closed mbosio85 closed 6 years ago

mbosio85 commented 6 years ago

Hello,

I ran phASER as suggested here by running Sanger imputation service on my VCFs.

I managed to run it and then obtain phASER results with no runtime problem but the VCF I obtain at the end of the pipeline has lost the AN and DP information. After Sanger imputation all variants have AN=2 and DP=2 Example: After imputing with Sanger

TYPED;RefPanelAF=0.489683;AN=2;AC=1;INFO=1 GT:ADS:DS:GP:PS:PG:PB:PI:PW:PC:PM 1|0:1,0:1:0,1,0:13380:0/1:.:.:1|0:.:.

Same variant before:

GT:AD:DP:GQ:PL 0/1:11,7:18:99:244,0,264

Is this a correct behavior for the analysis pipeline or is there a better way to keep the DP and AD information throughout the process ?

Thanks,

Mattia

secastel commented 6 years ago

Hi Mattia, Sorry I'm a bit confused by the example line you've given.

The line "After imputing with Sanger" seems to actually be after running phASER (it includes many phASER specific tags e.g. PS:PG:PB:PI:PW:PC:PM). Can you post the example line after imputation, but before running phASER?

Stephane

mbosio85 commented 6 years ago

Hi Stephane, You are right, I sent the wrong line. Here an example at each step: Original VCF

1 534192 . C T 277.77 . AC=1;AF=0.500;AN=2;MQ=36.65; GT:AD:DP:GQ:PL 0/1:36,26:63:99:306,0,875

Here after Sanger imputation:

1 534192 rs6680723 C T . PASS TYPED;RefPanelAF=0.235818;AN=2;AC=1;INFO=1 GT:ADS:DS:GP:PS 0|1:0,1:1:0,1,0:13380

Here after phASER execution:

1 534192 rs6680723 C T . PASS TYPED;RefPanelAF=0.235818;AN=2;AC=1;INFO=1 GT:ADS:DS:GP:PS:PG:PB:PI:PW:PC:PM 0|1:0,1:1:0,1,0:13380:0/1:.:.:0|1:.:.

Are these the correct outputs I should normally get? and, if so, do you know if there is a way to keep the original information of AD and DP ? Thanks a lot for your help

Mattia

secastel commented 6 years ago

Hi Mattia, From looking at the lines you posted, it seems that the fields of interest (AD, DP) are actually removed during imputation with the Sanger Imputation Service. This is completely separate from phASER, and I have no control over how that pipeline works. phASER itself will not remove any tags that are present in the input file given to it. You could try getting in touch with someone there with your question. Sorry I couldn't be of more help.

Stephane