open2c / cooler

A cool place to store your Hi-C
https://open2c.github.io/cooler
BSD 3-Clause "New" or "Revised" License
204 stars 50 forks source link

Normalization #48

Closed ya-guo closed 7 years ago

ya-guo commented 7 years ago

Could you explain what's the difference between a.k.a balancing and ICE(iterative correction and eigenvector decomposition), how to understand balancing, and whether the balancing used in the cooler is a upstate for Hi-C normalisation.

nvictus commented 7 years ago

Matrix balancing is the name for the decomposition of a square matrix into a stochastic matrix (flat and equal row and column sums) and a set of balancing weights. The IC part of ICE is an algorithm for matrix balancing on a symmetric matrix like a Hi-C contact matrix. Algorithms for matrix balancing have been rediscovered several times in different fields for different purposes (e.g. in statistical modeling and numerical linear algebra). Lior Pachter even wrote an interesting blog post about it.

The balancing algorithm implemented in cooler is a sparse, parallel and out-of-core (i.e. works in chunks that fit in memory) version of the iterative correction method in Imakaev et al. One trivial difference is that the output balancing weights in cooler are multiplicative (thus, 1/bias as defined in the original paper). Implementing a sparse and out-of-core method was necessary for scaling up to very large high-resolution Hi-C data.

rysterzhu commented 1 year ago

Do we need to correct the sequencing depth of the data by downsample or scale before doing the balance? Thanks