Closed golobor closed 4 years ago
Also intra-chromosomal translocations, which can, probably, be filtered similarly, but by long-range cis over total cis, or over short range cis.
Since cis/total has a broad, locus-dependent, range influenced by mant factors, I've found that filtering on whether a bin has any extreme trans-values (e.g. equivalent to third cis diagonal) can be useful in addition to just filtering on cis/total.
On Thu, Sep 20, 2018, 7:55 AM Ilya Flyamer notifications@github.com wrote:
Also intra-chromosomal translocations, which can, probably, be filtered similarly, but by long-range cis over total cis, or over short range cis.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mirnylab/cooler/issues/134#issuecomment-423214310, or mute the thread https://github.com/notifications/unsubscribe-auth/ASvzZfQG8-WXMELGuQnETsU5fpUy7ipqks5uc6xigaJpZM4WyTaR .
bumping this-- @golobor suggests that cis & total sums can be set as columns during balancing
See #210
A major practical issue with processing Hi-C data is the presence of genomic translocations. They lead errors into the calculated genomic distances and confuse cis contacts with trans ones, thus breaking the expectations of downstream analyses (obs/exp, eigenvectors). Historically, @mimakaev and @gfudenberg dealt with these issues by filtering out genomic bins that form an untypically high fraction of trans contacts. Recently, @Phlya pointed out the need for such filters in cooler. A simple suggestion would be to calculate the cis/total fraction in raw bins (cis_tot_raw), detect low-value outliers using MADmax and filter them out (on top of the already used MADmax-coverage filter).It may also be useful to report both the cis_tot_raw and cis_tot_balanced (i.e. cis/total fraction after filtering/balancing).