nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

Error with step2 his-pro on allele specific mode. Please help #329

Closed StephF-1130 closed 4 years ago

StephF-1130 commented 4 years ago

Hi, I am using hicpro (allele-specific) in parallel mode. For a given "MySample" I have multiple data that I am running in parallel. The first step runs well, but the step2 stops at the build matrix. I think it may have to do with the path and or name assignment specific to allele-specific run... maybe.

Here is the command I run: make --file /home/my_path_to/bin/HiC-Pro_2.11.1/scripts/Makefile CONFIG_FILE=config_MySample CONFIG_SYS=/home/my_path_to/bin/HiC-Pro_2.11.1/config-system.txt all_persample 2>&1

Here is the log compute-2-1.hpc.###.org

Mon Mar 11:16:07 PDT 2020 Merge chunks from the same sample ... Logs: logs/data_002/merge_valid_interactions.log Logs: logs/data_004/merge_valid_interactions.log

Mon Mar 11:22:07 PDT 2020 Merge stat files per sample ... Logs: logs/data_002/merge_stats.log Logs: logs/data_004/merge_stats.log

Mon Mar 11:22:09 PDT 2020 Generate binned matrix files ... Logs: logs/data_002/build_raw_maps.log make: *** [build_raw_maps] Error 1

And so, here is the content of the error log file [ logs/data_002/build_raw_maps.log ]

Generate contact maps at 5000 resolution ...

cat hic_results/data//data_002/data_002/inpdata_R1G[12].allValidPairs | /home/path_to/bin/HiC-Pro_2.11.1/scripts/build_matrix --matrix-format upper --binsize 5000 --chrsizes /home/path_to/bin/HiC-Pro_2.11.1/annotation/chrom_hg19.sizes --ifile /dev/stdin --oprefix hic_results/matrix/data_002/raw/5000/inpdataR1G[12]${bsize} cat: hic_results/data//data_002/data_002/inpdata_R1G[12].allValidPairs: No such file or directory

--

So the thing is the command is created automatically and hic_results/data//data_002/data_002/inpdata_R1G[12].allValidPairs doesn't make sense because here is the content and hierarchy of the all valid pairs

$ls hic_results/data//data_002/ data_002.allValidPairs data_002_G1.allValidPairs data_002_G2.allValidPairs inpdata_hg19.bwt2pairs.DEPairs inpdata_hg19.bwt2pairs.DumpPairs inpdata_hg19.bwt2pairs.FiltPairs inpdata_hg19.bwt2pairs.REPairs inpdata_hg19.bwt2pairs.RSstat inpdata_hg19.bwt2pairs.SCPairs inpdata_hg19.bwt2pairs.SinglePairs inpdata_hg19.bwt2pairs.validPairs inpdata_hg19.bwt2pairs_interaction.bam

And as you can see there is no data_002/inpdata_R1G[12].allValidPairs in there. The whole thing stops here. Could you please help?

nservant commented 4 years ago

Hi Indeed, there is something wrong in the filenames / regexp somewhere ... difficult to know when exactly ... Could you try to create a new sample directory with link to the allValidPairs files you have. Something like

And to rerun

HiC-Pro -i vp_inputs -c YOUR8CONFIG -o OUTPUT -s build_contact_maps -s ice_norm
StephF-1130 commented 4 years ago

Hi, it worked pretty well. I have a few dozens of donors and each with replicates that are all run in parallel per donor. I am currently running hic-pro step1 of many of them now. An update to hicpro would be great to fit the error for step2.

Here is the output:

ls OUTPUT/ total 12 -rwxr-xr-x+ 1 sf fs_bioinformatics_common 3120 Apr 3 13:55 config_17 drwxr-sr-x+ 3 sf fs_bioinformatics_common 3 Apr 3 13:55 hic_results drwxr-sr-x+ 3 sf fs_bioinformatics_common 3 Apr 3 13:55 logs lrwxrwxrwx 1 sf fs_bioinformatics_common 91 Apr 3 13:55 rawdata -> /mnt//fsteph/Analysis/ImmuneCell0/HICPRO/SPL17_RH/RUN01/vp_inputs drwxr-sr-x+ 2 sf fs_bioinformatics_common 2 Apr 3 13:55 tmp

Here is the content of hicpro_results:

ls OUTPUT/hic_results/ matrix

ls OUTPUT/hic_results/matrix/data_002/ iced raw

ls OUTPUT/hic_results/matrix/data_002/iced/ 10000 100000 1000000 20000 40000 5000 500000

ls OUTPUT/hic_results/matrix/data_002/iced/10000/ data_002_G1_10000_iced.matrix data_002_G1_10000_iced.matrix.biases data_002_G2_10000_iced.matrix data_002_G2_10000_iced.matrix.biases

ls OUTPUT/hic_results/matrix/data_002/raw/5000/ data_002_G1_5000.matrix data_002_G1_5000_abs.bed data_002_G2_5000.matrix data_002_G2_5000_abs.bed

Hi Indeed, there is something wrong in the filenames / regexp somewhere ... difficult to know when exactly ... Could you try to create a new sample directory with link to the allValidPairs files you have. Something like

  • vp_inputs ++ sample ++ data_002_G1.allValidPairs ++ data_002_G2.allValidPairs

And to rerun

HiC-Pro -i vp_inputs -c YOUR8CONFIG -o OUTPUT -s build_contact_maps -s ice_norm

nservant commented 4 years ago

Ok. Glad it works, but it is a bit strange, as I do not understand at all, why it failed during step2 ...

StephF-1130 commented 4 years ago

My interpretation is that, in allele-specific mode, the naming of the input is taken twice and some how the fastq prefix is also used ... So, as input for building the matrix, you get data_002/data_002/inpdata_R1G[12].allValidPairs instead of data_002/data_002_G[12].allValidPairs I should note that the prefix "inpdata_R1" is the prefix of the fastq files, ie I have inpdata_R1... and inpdata_R2... I couldn't trace exactly where that happens in the in the code.