More robust co-occurrence calculation

DPLemonade commented 5 months ago

IMPORTANT: Please search among the Pull requests before creating one.

Description

I modified _occur_count in gr/_ppatterns.py to enable more robust calculation of the co-occurrence probability ratio. My changes include:

1) Modified intervals from discrete interval bins to increasing radii sizes.

idx_x, idx_y = np.nonzero((pw_dist <= thres_max) & (pw_dist > 0))

instead of

idx_x, idx_y = np.nonzero((pw_dist <= thres_max) & (pw_dist > thres_min))

This enables retainment of more cells within a given distance threshold, which is especially important when certain cell types are very scarce. Visually, the co-occurrence plots are smoother and less jagged.

2) Modified the way co-occurrence pairs are counted (lines 297-301).

np.triu_indices_from called in the co_occurrence function returns the indices of non-repetitive pairs of splits, meaning the pair (split i, split j) only occurs once. If cell A from split i and cell B from split j are proximal (i != j), then both need to be counted, i.e. conditioned on A, B is proximal, and conditioned on B, A is also proximal.

3) Modified the way conditional probability is computed (lines 303-316).

When certain cell types are scarce or not within a given distance threshold, division by zero errors may occur. Special handling needs to be done to ensure that the co-occurrence values are not nans and can be plotted.

How has this been tested?

This has been tested on the Xenium breast cancer tumor microenvironment Dataset in the 'Analyze Xenium data' tutorial, as well as on a toy dataset. The code has passed all local tests as indicated by the Contributing guide.

Tutorial dataset

Before modifications:

After modifications:

Notice that the plot is smoother and less jagged.

Toy dataset

Spatial plot:

Before modifications:

After modifications:

The parameters interval=np.arange(0, 2000, 100), n_splits=2 are used to compute co-occurrence. Without handling of division by zero errors, the values beyond 400 for cells of type 1 and type 3 as well as all values for cells of type 2 become nans and cannot be plotted accurately, whereas a robust handling of division by zero errors yields a calculation that is a better representation of the actual co-occurrence relationship.

Closes

giovp commented 5 months ago

hi @DPLemonade , thank you so much for this contribution, it looks good! could you please add a test for his implementation? for example, it could be a regression test for the toy example that you proposed. Meanwhile I will fix the lint in a separate PR.

DPLemonade commented 5 months ago

@giovp , glad to contribute! Could you elaborate on what you mean by a regression test? I'm not sure what this means and how it is applicable in the context of computing co-occurrence.

codecov-commenter commented 4 months ago

Codecov Report

Attention: Patch coverage is 13.33333% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 69.90%. Comparing base (df8e042) to head (7985f34). Report is 1 commits behind head on main.

:exclamation: Current head 7985f34 differs from pull request most recent head f89f237

Please upload reports for the commit f89f237 to get more accurate results.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #816 +/- ## ========================================== - Coverage 69.99% 69.90% -0.09% ========================================== Files 39 39 Lines 5525 5532 +7 Branches 1029 1031 +2 ========================================== Hits 3867 3867 - Misses 1363 1369 +6 - Partials 295 296 +1 ``` | [Files](https://app.codecov.io/gh/scverse/squidpy/pull/816?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse) | Coverage Δ | | |---|---|---| | [src/squidpy/gr/\_ppatterns.py](https://app.codecov.io/gh/scverse/squidpy/pull/816?src=pr&el=tree&filepath=src%2Fsquidpy%2Fgr%2F_ppatterns.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse#diff-c3JjL3NxdWlkcHkvZ3IvX3BwYXR0ZXJucy5weQ==) | `78.96% <13.33%> (-1.85%)` | :arrow_down: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/scverse/squidpy/pull/816/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse)

scverse / squidpy