simonwm / tacco

TACCO: Transfer of Annotations to Cells and their COmbinations
BSD 3-Clause "New" or "Revised" License
44 stars 1 forks source link

Clarification on the Implementation of _count_soft_co_occurrences_dense Function #17

Closed kuang-da closed 4 months ago

kuang-da commented 5 months ago

Hi,

Thank you for the great software and informative documentation. Your setup to approach the cell-type co-occurrence problem is particularly intriguing to me. After reading the source code, I have a clarification question about the implementation of the _count_soft_co_occurences_dense function, specifically concerning [this line of code] (https://github.com/simonwm/tacco/blob/ce8478c55eb734f8eeddc3cb2db37b4d4b0cd519/tacco/tools/_co_occurrence.py#L165). The code seems to aggerate the composition of reference cells into bins by summing up contributions within the same bin:

temp_i[k, rc] += reference_contributions_j[rc]

I am uncertain whether summing compositional data within the same bin effectively estimates the overall composition. Would it be more accurate to use the mean of these compositional contributions instead? After all, this is a 2D surface, so increasing distance means a large area of coverage. Any insights or comments you could provide would be greatly appreciated.

Thank you for your time and assistance.

JWatter commented 4 months ago

Hi,

Thank you for your appreciation!

As this piece of code is at the computationally optimized core of the co-occurrence calculation, it is a bit more convoluted than vanilla code, so it is a little harder to read... But you are right, the counts are just the plain sums and not normalized, yet. Having the plain sums available one can do the normalization later and that is done in the lines following this here https://github.com/simonwm/tacco/blob/ce8478c55eb734f8eeddc3cb2db37b4d4b0cd519/tacco/tools/_co_occurrence.py#L522 to get to per bin (i.e. in your case per 2D ring) normalized data. With appropriately chosen weights for the summation one can flexibly represent different normalizations.

Hope this helps!