tanlongzhi / dip-c

Tools to analyze Dip-C (or other 3C/Hi-C) data
61 stars 18 forks source link

Strange Segment Format #37

Closed YuxuanZheng94 closed 4 years ago

YuxuanZheng94 commented 4 years ago

Hi Longzhi, I used the latest version of hickit to perform the single-cell Hi-C data, when I extracted segments with the following cold

k8 hickit.js sam2seg -v $snp ${sp}_aln.sam.gz | hickit.js chronly -y - | gzip > ${sp}_contacts_seg.gz

I got the following data. It's strange that the read format is not like yours provided in dip-c README.

ST-E00522:513:HVKWHCCXY:5:1101:15676:1854 chr3!119719306!119719456!-!.!60!1 chr3!119713160!119713310!+!.!60!1 ST-E00522:513:HVKWHCCXY:5:1101:18933:1872 chr18!8795309!8795364!-!.!34!1 chr18!9160818!9160929!+!.!60!2 ST-E00522:513:HVKWHCCXY:5:1101:6877:2083 chr19!26243774!26243961!-!.!60!2 chr19!26197893!26197970!+!.!60!1

tanlongzhi commented 4 years ago

Hi @YuxuanZheng94,

Sorry about the confusion. Hickit uses a slightly different format for read segments; but it's basically the same thing, just with ! (defined as hic_sub_delim in hickit.js) as the delimiter rather than ,.

So basically the hickit .seg format is:

<read name> <chr>!<reference start>!<reference end>!<reference strand>!<phase>!<mapping quality>!<read number: read 1 or 2>
YuxuanZheng94 commented 4 years ago

Wow~ Thank you very much!