Open adippolito12 opened 3 years ago
Hi Tony, Thanks for the suggestion. Could you give the definition of Chicago input files please ... so that I can see how difficult it would be ? Thanks
Sure, Nicholas:
Chicago does have a script that will take a bam file and convert it to their input format, so it may suffice to just have a deduped, valid pairs file in bam format: https://bitbucket.org/chicagoTeam/chicago/src/master/chicagoTools/
Tony
Hi Tony,
With the GET_PROCESS_SAM
, HiC-Pro outputs a BAM file with a flag according to the pairs type.
So it should be simple to extract the VI
flag from this file, and give it to the Chicago script.
Did you already test that ?
Thanks
Hey NIcholas,
I noticed that output option, but I wasn't sure if that included deduped pairs as well. Also, do you have a glossary of which flags correspond to which pair type?
Thanks!
Arfff yes, you're right, duplicates are not removed at this stage ... The information is stored in the XA flag with DE=dangling_end, VI=valid pairs, RE=religation and SC=self-circle.
On the link you sent, I do no see any clear description of the .chinput
file format ? do you have it somewhere ?
Apologies -- here's an example file: https://bitbucket.org/chicagoTeam/chicago/src/master/PCHiCdata/inst/extdata/GMchinputFiles/GM_rep1.chinput
It's a 5-column tsv (with a header line): bait fragment id, other end fragment id, # pairs supporting the interaction, length of other end fragment, and distance between the two. The ids refer to ids specified in a rmap file and baitmap file that are also used as inputs in the Chicago workflow. The rmap file is just an in silico digested genome with numerical ids assigned to fragments. Those same ids are used in the baitmap file that specifies the bait fragments.
Hey Nicholas,
Wonderful pipeline -- I appreciate the added NextFlow implementation.
I had a request/recommendation that may improve compatibility with promoter-capture Hi-C users: a utility script that converts the final valid pairs file to Chicago input format (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908757/). The conversion is doable with some data munging, but a streamlined implementation would be super helpful for those who want to put together a HiCPro/Chicago workflow for interaction calling in PC-Hi-C.
Thanks! Tony