open2c / cooltools

The tools for your .cool's
MIT License
138 stars 51 forks source link

snippers: order of "select" and "snip" matters, but should it #502

Open sergpolly opened 8 months ago

sergpolly commented 8 months ago

Here is a simple example, consider simple CoolerSnipper:

snipper = CoolerSnipper(
    clr,
    view_df=view_df,
    cooler_opts={"balance": "weight"},
    min_diag=2,
)

where view_df is

chrom   start   end name
chr1    100000000   150000000   foo
chr2    100000000   150000000   bar

inputs are taken from cooltools/tests. Now, if we select/snip in the "right" order:

# select and snip foo
foo_mat = snipper.select("foo","foo")
foo_snip = snipper.snip(foo_mat, "foo", "foo", (100000000, 107000000, 120000000, 127000000) )
# select and snip bar
bar_mat = snipper.select("bar","bar")
bar_snip = snipper.snip(bar_mat, "bar", "bar", (100000000, 107000000, 120000000, 127000000) )

results look like this: download-1

when we change the order of snip/select, like so:

# select foo and bar:
foo_mat = snipper.select("foo","foo")
bar_mat = snipper.select("bar","bar")
# snip foo and bar:
foo_snip = snipper.snip(foo_mat, "foo", "foo", (100000000, 107000000, 120000000, 127000000) )
bar_snip = snipper.snip(bar_mat, "bar", "bar", (100000000, 107000000, 120000000, 127000000) )

result would look like so: download-2

Note that white bar on the foo_snip - it is because snipper.select("bar","bar") modified some instance attributes - _isnan1, _isnan2 etc, so now those attributes from "bar", modified them for "foo" as well ... We're lucky in this case it didn't crash because dimensions of "foo" and "bar" are identical.

Anyhow, this is not a big deal right now for the way snippers are used (they're used in the right order, even in multiprocessing scenario, i hope), but this is potentially confusing and I wanted to document this is a fact of life. potentially - would be nice to decouple selecting and snipping if others agree - also this could be added to such a refactoring wishlist #227