nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

processing HiC-Pro using split_reads.py #490

Open yamzaleg opened 2 years ago

yamzaleg commented 2 years ago

Hello,

I was able to successfully run HiC-Pro on my files, and in order to save time I split the files into 10 M reads chunks before processing it. The final bowtie and hic results yields about 33 files per sample. I'm wondering with respect to the .FiltPairs, SCPairs, DEPairs, and .validPairs files in the data directory, how should I combine it to one file (is it a simple cat'ing of the files?). Are the DEPairs files I use for further processing?

Yonatan

nservant commented 2 years ago

Hi Yonatan, Only the validPairs files are merged to generate the allValidPairs file and to construct the contact maps. This is a simple cat ... with an additional step to remove the duplicated reads. Best

yamzaleg commented 2 years ago

Hi!

Thank you so much. I also wanted to try processing data with FitHiChIP for purposes of differential looping and significant CIS interacting peaks. They suggest using a Peak file along with the Validpairs file, which hey said you can derive from calling MACS2 on the bam files. I have two questions: 1) which bam files do I use as I have three directories produced in bowtie_alignment directory? 2) as I split the files before do I just samtools merge them (this one seems obvious, but I want to make sure)?

Yonatan