nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
386 stars 182 forks source link

ice matrix bin_0 #416

Closed zhanwen-cheng closed 3 years ago

zhanwen-cheng commented 3 years ago

Hi, I just finished my HiC-Pro procedure and I found my iced matrix _bini and _binj didn't match to my raw bed file. In my raw bed file, the bin starts from 1

M_NODE_1_length_553359_cov_14.738569    0       1000    1
M_NODE_1_length_553359_cov_14.738569    1000    2000    2
M_NODE_1_length_553359_cov_14.738569    2000    3000    3
M_NODE_1_length_553359_cov_14.738569    3000    4000    4
M_NODE_1_length_553359_cov_14.738569    4000    5000    5
M_NODE_1_length_553359_cov_14.738569    5000    6000    6
M_NODE_1_length_553359_cov_14.738569    6000    7000    7
M_NODE_1_length_553359_cov_14.738569    7000    8000    8
M_NODE_1_length_553359_cov_14.738569    8000    9000    9
M_NODE_1_length_553359_cov_14.738569    9000    10000   10
M_NODE_1_length_553359_cov_14.738569    10000   11000   11
M_NODE_1_length_553359_cov_14.738569    11000   12000   12
M_NODE_1_length_553359_cov_14.738569    12000   13000   13
M_NODE_1_length_553359_cov_14.738569    13000   14000   14
M_NODE_1_length_553359_cov_14.738569    14000   15000   15
M_NODE_1_length_553359_cov_14.738569    15000   16000   16
M_NODE_1_length_553359_cov_14.738569    16000   17000   17
M_NODE_1_length_553359_cov_14.738569    17000   18000   18
M_NODE_1_length_553359_cov_14.738569    18000   19000   19
M_NODE_1_length_553359_cov_14.738569    19000   20000   20
M_NODE_1_length_553359_cov_14.738569    20000   21000   21
M_NODE_1_length_553359_cov_14.738569    21000   22000   22

However, in my iced matrix, the _bini starts from 0:

0       1       1.384420
4       5       1.384420
6       7       2.364515
7       7       0.000000
7       8       2.364515
11      11      1.384420
13      13      0.021425
13      14      2.163904
19      19      1.384420
23      24      1.384420
26      26      0.021425
26      508     2.163904
34      35      2.163904
35      36      0.021425
36      37      2.163904
39      39      4.153261
44      45      2.163904
45      45      0.021425
48      48      1.384420
53      240     1.384420
56      56      1.384420
57      58      2.768841
60      60      2.289669
60      61      0.047029
61      62      2.313421
63      63      1.384420

So I couldn't match bin_0 to any of my bed contig. Can you help me with this bin_0? The attached are the first 1000 line of my iced matrix and bed file.

fastp_1000_abs.bed.txt fastp_1000_iced.matrix.txt

zhanwen-cheng commented 3 years ago

BTW, there are some problems with my iced package. I failed to get iced matrix at the first time. Then I used conda to upgrade iced package to version 0.5.8, and ran the ice_norm alone.

zhanwen-cheng commented 3 years ago

And the first several lines of the matrix file under raw folder is like below:

1       2       1
5       6       1
7       8       1
8       8       2
8       9       1
12      12      1
14      14      1
14      15      1
20      20      1
24      25      1

and it seems that the _bin1 _binj here equals to _bini _binj +1 under the iced martix. So the _bin0 I got in my iced matrix actually is the _bin1 in my bed file?

nservant commented 3 years ago

Hi, Indeed, it looks like your iced matrix is 0-based. I'll check with the iced developer. Thanks

jinsooahn commented 3 years ago

Hi Nicholas and guys,

I had the same problem in iced.matrix as shown below. So, I tested ice with --base 1 and --base 0 as also shown below. It looks like --base 0 works.

I am wondering if it is appropriate to add BASE = 0 option in the following Normalization section of the original config.txt file. Thanks.

#######################################################################

Normalization

####################################################################### MAX_ITER = 100 FILTER_LOW_COUNT_PERC = 0.02 FILTER_HIGH_COUNT_PERC = 0 EPS = 0.1

$ head -n 1550 split_PEF_rep1_40000_iced.matrix 0 0 424.572230 0 1 207.456328 0 2 57.489488 0 3 51.996868 0 4 53.219822 0 5 38.218328 0 6 29.179410 0 7 33.432296 0 8 19.663675 0 9 41.781578 ... 0 62695 2.021237 0 62752 1.092886 0 62792 1.071577 0 62810 1.143926 0 62826 0.029774 1 1 330.390866 1 2 236.468261 1 3 141.730233 1 4 93.469007 1 5 45.797369 1 6 39.024500 ...

The raw matrix is below: $ head -n 1550 split_PEF_rep1_40000.matrix 1 1 381 1 2 174 1 3 45 1 4 42 1 5 53 1 6 25 1 7 20 1 8 24 1 9 17 1 10 21 ... 1 62696 2 1 62753 1 1 62793 1 1 62811 1 1 62827 2 2 2 259 2 3 173 2 4 107 2 5 87 2 6 28 2 7 25 ...

$ ice --max_iter 100 --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --eps 0.1 --base 1 split_PEF_rep1_40000.matrix $ head split_PEF_rep1_40000_normalized.matrix 0 0 390.961310 0 1 198.463520 0 2 54.959578 0 3 49.731842 0 4 50.761421 0 5 36.582836 0 6 27.901364 0 7 31.830328 0 8 18.726457 0 9 39.959415 ... 0 62695 1.917478 0 62752 0.969278 0 62792 1.031450 0 62810 0.938959 0 62826 0.028182 1 1 328.362716 1 2 234.854895 1 3 140.828862 1 4 92.618937 1 5 45.542622 1 6 38.766634 ...

$ ice --max_iter 100 --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --eps 0.1 --base 0 split_PEF_rep1_40000.matrix $ head -n 1550 split_PEF_rep1_40000_normalized.matrix 1 1 390.961310 1 2 198.463520 1 3 54.959578 1 4 49.731842 1 5 50.761421 1 6 36.582836 1 7 27.901364 1 8 31.830328 1 9 18.726457 1 10 39.959415 ... 1 62696 1.917478 1 62753 0.969278 1 62793 1.031450 1 62811 0.938959 1 62827 0.028182 2 2 328.362716 2 3 234.854895 2 4 140.828862 2 5 92.618937 2 6 45.542622 2 7 38.766634 ...

nservant commented 3 years ago

Hi @fitart,

Thanks for your message. Which iced version are you using please ? And actually, it looks like the two option are inverted, isn't it ? --base 0 gives 1-based coordinates, and --base 1 gives 0-based coordinates ? @NelleV , this is for you ;)

jinsooahn commented 3 years ago

Thanks for your reply. I am using iced version 0.5.8.

$ python3.7 -m pip list iced 0.5.8 ...

Yes, it looks inverted. The 1-based matrix may be required for the downstream analysis using GENOVA and HiCExplorer, as far as I checked.

Best,

esebesty commented 3 years ago

Hi all, looks like I ran into this issue, and issue #389 might be also related. I'm using iced 0.5.4 with the HiC-Pro pipeline as specified in the environment.yaml. I was trying to run the hicpro2fithic.py script and it failed with a key error. When checking the results, the iced.matrix.biases file has one more line compared to the abs.bed file.

What would be the solution? Update to iced 0.5.8 and re-generate the ice normalized matrices?

nservant commented 3 years ago

fixed in iced 0.5.9 (and HiC-Pro 3.1.0 soooooon ! )

BenxiaHu commented 2 years ago

this bug occured to me, too. iced=0.5.10=py38h803c66d_0.