nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

Merge after splitting fastq files into read chunks #498

Closed Aannaw closed 2 years ago

Aannaw commented 2 years ago

Hello I am new to the Hic analysis and I would like to ask about parallel processing. I have only one sample and I split my fastq files into read chunks of 20M reads. All files are in the same input folder ($PWD/rawdata/sample1). I run HIC-Pro using stepwise function. When I have runned the first step to align to raw reads with the command: HiC-Pro -i rawdata -o results -c config-hicpro.txt -s mapping -s quality_checks -p, I got the bam files by chunks (results1/bowtie_results/bwt2/sample1) : 00_Ma6_R1_Ma6.bwt2merged.bam 00_Ma6_R2_Ma6.bwt2merged.bam 01_Ma6_R1_Ma6.bwt2merged.bam 01_Ma6_R2_Ma6.bwt2merged.bam 02_Ma6_R1_Ma6.bwt2merged.bam 02_Ma6_R2_Ma6.bwt2merged.bam .... 51_Ma6_R1_Ma6.bwt2merged.bam 51_Ma6_R2_Ma6.bwt2merged.bam 52_Ma6_R1_Ma6.bwt2merged.bam 52_Ma6_R2_Ma6.bwt2merged.bam Then I run the second step with the Hi-C processing step from the aligned data:HiC-Pro -i results/bowtie_results/bwt2 -o results -c config-hicpro.txt -s proc_hic -s quality_checks. It is runing and generate the files in two folders respectively. (results2/bowtie_results/bwt2) 00_Ma6_Ma6.bwt2pairs.bam 01_Ma6_Ma6.bwt2pairs.bam 02_Ma6_Ma6.bwt2pairs.bam 03_Ma6_Ma6.bwt2pairs.bam 04_Ma6_Ma6.bwt2pairs.bam 05_Ma6_Ma6.bwt2pairs.bam ... 51_Ma6_Ma6.bwt2pairs.bam 52_Ma6_Ma6.bwt2pairs.bam

(/results2/hic_results/data/sample1)
0_Ma6_Ma6.bwt2pairs.DEPairs 00_Ma6_Ma6.bwt2pairs.DumpPairs 00_Ma6_Ma6.bwt2pairs.FiltPairs 00_Ma6_Ma6.bwt2pairs.REPairs 00_Ma6_Ma6.bwt2pairs.RSstat 00_Ma6_Ma6.bwt2pairs.SCPairs 00_Ma6_Ma6.bwt2pairs.SinglePairs 00_Ma6_Ma6.bwt2pairs.validPairs 01_Ma6_Ma6.bwt2pairs.DEPairs 01_Ma6_Ma6.bwt2pairs.DumpPairs 01_Ma6_Ma6.bwt2pairs.FiltPairs 01_Ma6_Ma6.bwt2pairs.REPairs 01_Ma6_Ma6.bwt2pairs.RSstat 01_Ma6_Ma6.bwt2pairs.SCPairs 01_Ma6_Ma6.bwt2pairs.SinglePairs 01_Ma6_Ma6.bwt2pairs.validPairs

As is said " will be merged before building the contact maps", but the generate files are still by chunks with 20M reads and not merged. I am not sure if the merge step is step2 I am running. Is there any flag that I missed for the merge?

nservant commented 2 years ago

Hi, Please run the sptewise mode with -s merge_persample -s build_contact_maps -s ice_norm, to merge the chunks, build the maps and normalized them Best

Aannaw commented 2 years ago

Dear Professors: Thanks for your reply. I have added the "-s merge_persample" when I run the third step "HiC-Pro -i results2/data -o results3 -c config_test.txt -s build_contact_maps -s ice_norm" The chuncks wre merged and build the maps. Thank you very much.