nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
386 stars 182 forks source link

hic-pro 3 breaks ice normalized matrixes and sparsetodense.py #433

Closed ChenfuShi closed 3 years ago

ChenfuShi commented 3 years ago

Hi, I updated to hic-pro version 3.0.0 but now the ice normalized matrixes are not compatible anymore with the sparsetodense script.

the ICE matrixes seem to have a new format, the old format had 3 columns with binX, binY and counts. The new format has 3 rows and they seem to be all floating point numbers. This affects only the ice normalized matrixes and not the raw matrixes, I imagine this is caused by some bug in the ice normalization script.

The file is too big to upload but if you need it i can upload it somewhere else.

Thanks!

nservant commented 3 years ago

Hi, The iced version you are using is bugged ! please update the package. Sorry for that N

ChenfuShi commented 3 years ago

i was using 0.5.4 which is the one that is set in your environment.yml file! I'll update it and see what happens

nservant commented 3 years ago

Yes, my fault. I'll update it. Sorry N

ChenfuShi commented 3 years ago

Thanks!

nservant commented 3 years ago

Of note, another user reported me that the latest iced version 0.5.8 may also have different outputs compared to the previous ones. The two first columns are expected to be one-based indices, while in the 0.5.8 they are zero-based. If you upgrade to 0.5.8 and observe the same thing, please let me know ... so that I can contact the iced developer. Thanks

ChenfuShi commented 3 years ago

yes it looks like they are 0-based

esebesty commented 3 years ago

I ran into this issue while trying to convert the ice normalized matrix to the format needed by TopDom. Looks like a simple matrix transpose with R on the ice normalized matrix solves the issue.

library("data.table")

icematrix   <- fread(file = "data/sample_1000000_iced.matrix", header = FALSE)
icematrix_t <- t(icematrix)
icematrix_d <- as.data.frame(icematrix_t)

write.table(icematrix_d, file = "data/sample_1000000_iced_corr.matrix", sep = "\t", quote = FALSE,
            col.names = FALSE, row.names = FALSE)

The new matrix looks like this:

1       1       3590.51297981505
2       1       1465.5227894145
3       1       270.030342746284
4       1       158.666018009291
5       1       90.5154390031429
6       1       56.8238616628458

and its accepted by the sparseToDense.py script.

BTW, it might be useful to add an option to the script and provide an N-by-(3+N) or N-by-(4+N) output matrix, where the first 3/4 columns are chr, from, to coordinates, or id, chr, from, to.

nservant commented 3 years ago

Fixed in iced 0.5.9 which has been added in the conda env of HiC-Pro 3.1.0