nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

what is the .FiltPairs files about? #335

Closed lanliting closed 4 years ago

lanliting commented 4 years ago

Hi, I use HiC-Pro-2.11.4 to process my hic data, and I got high proportion of filtered_pairs(about 20% of total reads, more serious with another HiChiP data with up to 40%). I am confused about why hicpro filter these reads. Can you help me? cat GSM3791773_genome.bwt2pairs.RSstat Valid_interaction_pairs 10785917 Valid_interaction_pairs_FF 2450350 Valid_interaction_pairs_RR 2911077 Valid_interaction_pairs_RF 2686880 Valid_interaction_pairs_FR 2737610 Dangling_end_pairs 3730696 Religation_pairs 906232 Self_Cycle_pairs 84807 Single-end_pairs 0 Filtered_pairs 29650993 Dumped_pairs 9179

nservant commented 4 years ago

Hi Filtered_pairs are the interactions which are filtered based on config parameters, ie on max/min frag size and max/min insert size. Best

nservant commented 4 years ago

Note that you can let these parameters blank to keep all filtered pairs. Especially for the insert size which varies a lot according to the way your library were prepared. Look at the histogram of insert size to have an idea of the distribution ... You will clearly see where you decided to cut the distribution

lanliting commented 4 years ago

Thank you so much!! Could you tell me how to check the insert size?

nservant commented 4 years ago

In theory, the histogram is plotted in the pic folder ? If you do not have it, you have the insert size in the validPairs file (one of the last columns) N

nservant commented 4 years ago

In theory, the histogram is plotted in the pic folder ? If you do not have it, you have the insert size in the validPairs file (one of the last columns) N

lanliting commented 4 years ago

Thank you so much! I`ve found that.