nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

multiple restriction sites in cutsite_trimming #328

Closed lanliting closed 4 years ago

lanliting commented 4 years ago

Dear Nicolas, The Arima Hi-C kit has two restriction motifs (GATC and GANTC ),I could set these two restriction sites in digest_genome.py when digesting the reference genome; however, I got an error when I run cutsite_trimming with "--cutsite" option setting as "GATC GANTC", cuz it can recognize only one restriction sites. Could you help me with this issue? How can I trim the bwt2glob.unmap.fastq files with two restriction sites for subsequent local mapping? Thank you! Lan

nservant commented 4 years ago

Hi, Which HiC-Pro version are you using ? The 'N' base is only fully supported in the very last version (2.11.4). cutsite_trimming is using the LIGATION_MOTIF from the config file. The ligation motif isnot the restriction one. See the https://github.com/nservant/HiC-Pro/blob/master/doc/FAQ.md for details on the Arima kits. Then, technically speaking, digest_genome.py works for a list of sites, space separated. Whereas the cutsite_trimming code require comma separated list. Best

lanliting commented 4 years ago

Thank you for your reply. I have just read FAQ1 "Can I use HiC-Pro with the Arima Hi-C Kit ?". And in Arima Hi-C Kit protocol, it says the RE-cocktail restriction motifs are "GATC and GANTC". So dose it means in the configuration file, I should set the LIGATION_SITE as "GATCGATC, GATCGANTC, GANTCGATC, GANTCGANTC"?

nservant commented 4 years ago

Yes. The RESTICTION motif are are "GATC and GANTC". So the possible LIGATION are "GATCGATC, GATCGANTC, GANTCGATC, GANTCGANTC" But sure to use the latest HiC-Pro version please.

lanliting commented 4 years ago

yes, I have downloaded the latest HiC-Pro version, and I`ve got the digesting genome using "digest_genome.py", which recognizes the 'N' base well. Thanks for your kind help!

nservant commented 4 years ago

Note that there is a mistake in this post. Arima kit uses restiction enzymes ^GATC and G^ANTC Si ligation motifs are GATCGATC,GANTGATC,GANTANTC,GATCANTC