open2c / bioframe

Genomic interval operations on Pandas DataFrames
MIT License
175 stars 28 forks source link

2D interval operations #25

Open Phlya opened 4 years ago

Phlya commented 4 years ago

In light of Anton's PR, just want to start a discussion about future possible functions for 2D interval operations for a later release.

  1. I think we decided on Slack they should just internally use 1D functions along each dimension, and then combine results. So they are more of a "sugar" than core functionality, but considering our focus on Hi-C analysis this seems important enough to implement - comparing dot calls seems like a frequent task (e.g. merging different resolution annotations (might be used in the dotcaller?), or obviously finding differential dot calls).
  2. I think we need to basically implement all the same functions as we (will) have for 1D overlaps, but for 2D. Except I am not sure if there is any reason to have 2D complement, and it seems ill defined anyway.
  3. I think it would be useful to have 2D vs 1D overlaps too. This is even easier to achieve by directly using 1D functions, but I'd say again it's something quite frequently needed - e.g. to annotate dot calls with CTCF peaks (and their orientation), or other ChIP-seq/whatever-seq peaks.

Other thoughts?

sebgra commented 2 years ago

Is it envisaged to carry the functions of arithmetic of interval in 2D ? This would be useful for calculating sequence overlays in 2D. Thanks in advance

gfudenberg commented 2 years ago

Hi @sebgra,

One function that could eventually get migrated to bioframe support for 2D interval ops would be this one currently in cooltools, assign_view_paired: https://github.com/open2c/cooltools/blob/a5341aa03f1bbcc1087983f2919602d4f25c333a/cooltools/lib/common.py#L12

If you provide a more explicit example of what you're hoping to achieve, we might be able to give a more detailed answer

Thanks! Geoff