Closed DPLemonade closed 4 months ago
hi @DPLemonade , thank you so much for this contribution, it looks good! could you please add a test for his implementation? for example, it could be a regression test for the toy example that you proposed. Meanwhile I will fix the lint in a separate PR.
@giovp , glad to contribute! Could you elaborate on what you mean by a regression test? I'm not sure what this means and how it is applicable in the context of computing co-occurrence.
Attention: Patch coverage is 13.33333%
with 13 lines
in your changes are missing coverage. Please review.
Project coverage is 69.90%. Comparing base (
df8e042
) to head (7985f34
). Report is 1 commits behind head on main.:exclamation: Current head 7985f34 differs from pull request most recent head f89f237
Please upload reports for the commit f89f237 to get more accurate results.
IMPORTANT: Please search among the Pull requests before creating one.
Description
I modified
_occur_count
in gr/_ppatterns.py to enable more robust calculation of the co-occurrence probability ratio. My changes include:1) Modified intervals from discrete interval bins to increasing radii sizes.
idx_x, idx_y = np.nonzero((pw_dist <= thres_max) & (pw_dist > 0))
instead of
idx_x, idx_y = np.nonzero((pw_dist <= thres_max) & (pw_dist > thres_min))
This enables retainment of more cells within a given distance threshold, which is especially important when certain cell types are very scarce. Visually, the co-occurrence plots are smoother and less jagged.
2) Modified the way co-occurrence pairs are counted (lines 297-301).
np.triu_indices_from
called in theco_occurrence
function returns the indices of non-repetitive pairs of splits, meaning the pair (split i, split j) only occurs once. If cell A from split i and cell B from split j are proximal (i != j), then both need to be counted, i.e. conditioned on A, B is proximal, and conditioned on B, A is also proximal.3) Modified the way conditional probability is computed (lines 303-316).
When certain cell types are scarce or not within a given distance threshold, division by zero errors may occur. Special handling needs to be done to ensure that the co-occurrence values are not nans and can be plotted.
How has this been tested?
This has been tested on the Xenium breast cancer tumor microenvironment Dataset in the 'Analyze Xenium data' tutorial, as well as on a toy dataset. The code has passed all local tests as indicated by the Contributing guide.
Tutorial dataset
Before modifications:
After modifications:
Notice that the plot is smoother and less jagged.
Toy dataset
Spatial plot:
Before modifications:
After modifications:
The parameters
interval=np.arange(0, 2000, 100), n_splits=2
are used to compute co-occurrence. Without handling of division by zero errors, the values beyond 400 for cells of type 1 and type 3 as well as all values for cells of type 2 become nans and cannot be plotted accurately, whereas a robust handling of division by zero errors yields a calculation that is a better representation of the actual co-occurrence relationship.Closes