nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
386 stars 182 forks source link

extract_snp.py does not return phased snps? #351

Closed shanwai1234 closed 3 years ago

shanwai1234 commented 4 years ago

Hi Authors,

I am trying to use allele specific pipeline for HiC data processing. One script is extract_snp.py, although authors mention "phasing" in the tutorial, however, I look at the result and it seems all of return F1 is still unphased "0/1", which I assume it should be "0|1" or "1|0" as the output. Would this affect the real analysis for allele-specific HiC? Because I have 2 data with paternal and maternal exchanged. Thank you!

Hunter

nservant commented 4 years ago

Hi Hunter, That's a good comment. But indeed, I did not really take care of that. The extract_snp.py utils is just there to process the Mouse sanger database vcf file, and as the goal is to extract the snps between the two strains, all variants will be "0/1" (in fact, this is hard-coded in the script ...) Then, HiC-Pro will take all snps from the vcf file, regardless the genotyping nomenclature ... Best

nservant commented 3 years ago

fixed