nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

different iced.matrix format #415

Closed zhanwen-cheng closed 3 years ago

zhanwen-cheng commented 3 years ago

Hi nservant, thanks for this useful software HiC-Pro. Recently I installed HiC-Pro 3.0.0 and ran my data. The log file was like below

Run HiC-Pro 3.0.0
--------------------------------------------
Sat Mar  6 19:09:39 CST 2021
Bowtie2 alignment step1 ...
Logs: logs/fastp/mapping_step1.log

--------------------------------------------
Sun Mar  7 00:43:33 CST 2021
Bowtie2 alignment step2 ...
Logs: logs/fastp/mapping_step2.log

--------------------------------------------
Sun Mar  7 03:32:22 CST 2021
Combine R1/R2 alignment files ...
Logs: logs/fastp/mapping_combine.log

--------------------------------------------
Sun Mar  7 04:13:46 CST 2021
Mapping statistics for R1 and R2 tags ...
Logs: logs/fastp/mapping_stats.log

--------------------------------------------
Sun Mar  7 04:47:16 CST 2021
Pairing of R1 and R2 tags ...
Logs: logs/fastp/mergeSAM.log

--------------------------------------------
Sun Mar  7 07:26:45 CST 2021
Assign alignments to restriction fragments ...
Logs: logs/fastp/mapped_2hic_fragments.log

--------------------------------------------
Sun Mar  7 10:01:38 CST 2021
Merge chunks from the same sample ...
Logs: logs/fastp/merge_valid_interactions.log

--------------------------------------------
Sun Mar  7 10:02:26 CST 2021
Merge stat files per sample ...
Logs: logs/fastp/merge_stats.log

--------------------------------------------
Sun Mar  7 10:02:27 CST 2021
Run quality checks for all samples ...
Logs: logs/fastp/make_Rplots.log

--------------------------------------------
Sun Mar  7 10:02:36 CST 2021
Generate binned matrix files ...
Logs: logs/fastp/build_raw_maps.log

--------------------------------------------
Sun Mar  7 10:03:00 CST 2021
Run ICE Normalization ...
Logs: logs/fastp/ice_1000.log
Logs: logs/fastp/ice_10000.log
Logs: logs/fastp/ice_20000.log
Logs: logs/fastp/ice_40000.log
Logs: logs/fastp/ice_150000.log
Logs: logs/fastp/ice_500000.log
Logs: logs/fastp/ice_1000000.log

However, while I checked my iced matrix file, the format was somewhat strange.

2.000000000000000000e+00 6.000000000000000000e+00 8.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00 1.200000000000000000e+01 1.400000000000000000e+
1.000000000000000000e+00 5.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 8.000000000000000000e+00 1.200000000000000000e+01 1.400000000000000000e+
1.384420388189328888e+00 1.384420388189328888e+00 2.364514633922650599e+00 1.243515436349700533e-30 2.364514633922650599e+00 1.384420388189328888e+00 2.142478893339943083e-

But in your manual, it is said "The contact maps are then available in the matrix folder. The matrix folder is organized with raw and iced contact maps for all resolutions. Contact maps are stored as a triplet sparse format ; bin_i / bin_j / counts_ij". That's different from the iced matrix I got. I also checked the ice_1000.log, there seems to be no big troubles

ice --results_filename hic_results/matrix/fastp/iced/1000/fastp_1000_iced.matrix --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --max_iter 100 --eps 0.1 --remove
/home/iese-chengzw/anaconda3/envs/HiC-Pro_v3.0.0/lib/python3.7/site-packages/iced/normalization/_ca_utils.py:9: UserWarning: The API of this module is likely to change. Use
"The API of this module is likely to change. "

So is this normal? What's the colnames of the matrix I got?

The attachments are my first 1000 columns of my matrix file. fastp_1000_iced.matrix_1000.txt

nservant commented 3 years ago

Hi, Could you please update the iced python package to 0.5.8 ? Thanks

zhanwen-cheng commented 3 years ago

Hi, I upgraded iced package and run the ice command, it succeeded. Thanks!

ZHIDIHUAYUAN commented 3 years ago

Hi, I have the same problem. I upgraded iced package and run the commandice --results_filename hic_results/matrix/sample1/iced/20000/sample1_G1_20000_iced.matrix --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --max_iter 100 --eps 0.1 --remove-all-zeros-loci --output-bias 1 hic_results/matrix//sample1/raw/20000/sample1_G1_20000.matrix The output is 0 54537 0.000000 0 54567 0.000000 But the sample1/raw/20000/sample1_G1_20000_abs.bed is chrM 0 16569 1 there is no bin_0 thanks

nservant commented 3 years ago

Hi, Yes, this is a bug in the iced package. Please update the iced package to the latest version N