stschiff / msmc

Implementation of the multiple sequential markovian coalescent
GNU General Public License v3.0
85 stars 21 forks source link

should I keep unphased sites? #18

Closed hhu1 closed 8 years ago

hhu1 commented 8 years ago

I am using MSMC over genomes sequenced by Complete Genomics. Based on the Schiffels & Durbin 2014 paper, unphased sites would introduce bias for population split analysis. However, when I looked into run_shapeit.sh tool, it seems that phasing was performed only on SNVs present in shapeit2 reference panel. Afterwards, both phased and unphased sites were merged into the same vcf file.

My question is, should I keep the unphased sites (those not present in the reference phasing panel) in my vcf file? If not, should I somehow fix the mask file to reflect the fact that only sites present in the reference panel are callable?

Thanks very much for your suggestions,

Hao Hu

stschiff commented 8 years ago

I always keep unphased sites, I can then decide when running MSMC to remove them using the —skipAmbiguous flag.

Best wishes, Stephan

On 26 Feb 2016, at 21:25, hhu1 notifications@github.com wrote:

I am using MSMC over genomes sequenced by Complete Genomics. Based on the Schiffels & Durbin 2014 paper, unphased sites would introduce bias for population split analysis. However, when I looked into run_shapeit.sh tool, it seems that phasing was performed only on SNVs present in shapeit2 reference panel. Afterwards, both phased and unphased sites were merged into the same vcf file.

My question is, should I keep the unphased sites (those not present in the reference phasing panel) in my vcf file? If not, should I somehow fix the mask file to reflect the fact that only sites present in the reference panel are callable?

Thanks very much for your suggestions,

Hao Hu

— Reply to this email directly or view it on GitHub https://github.com/stschiff/msmc/issues/18.