saezlab / liana-py

LIANA+: an all-in-one framework for cell-cell communication
http://liana-py.readthedocs.io/
GNU General Public License v3.0
162 stars 21 forks source link

Batch aware filtering of interactions in lrs_to_views #128

Closed robinfallegger closed 1 month ago

robinfallegger commented 3 months ago

Is your feature request related to a problem? Please describe. When using lrs_to_views, one can filter interaction based on a minimum required variance across samples. However, if batches are present in the data, this might select mostly batch effects rather than the biological signal we are interested in.

Describe the solution you'd like I would suggest something like the batch_key approach of the highly variable gene selection used in scanpy. Empty views, or views with only one batch could be dropped simultaneously.

I am still wondering whether one would prefer separating the object creation from the object filtering. Considering how difficult it is to use the MuData object currently, I would suggest to do the filtering concurrently but maybe others have suggestions one way or the other

dbdimitrov commented 2 months ago

Hi @demian1,

I agree with the idea that maybe doing filtering independently makes sense. Perhaps, besides the obvious of setting everything to (e.g. 0 or infinity depending on the filter), another simpler solution could be e.g. if one passes 'None' to all filter parameters then no filtering is carried out.

Regarding the variance filter, my main motivation when I implemented was that variables with 0 variances were causing MOFA+ to crash, but you are right that if the variance across LRs is being driven by batch then this parameter might be suboptimal. I agree that the scanpy approach is a good way to address this, also quite simple to implement.

However, since we don't rank, it could be just the union or intersection of highly-variable interactions across the batches. Or alternatively the average variance across batches? This I would need to think a bit about, but I will likely go for the latter approach (i.e. mean).

Empty views, or views with only one batch could be dropped simultaneously.

This is also a really good point.

Please let me know your thoughts, I will aim to have the proposed solution implemented in the next update :)

robinfallegger commented 2 months ago

Sounds good!

dbdimitrov commented 2 months ago

PS. To implement batch_key wrt mean var + a new parameter for var_min_nbatches. + Add info to .var

robinfallegger commented 1 month ago

implemented in 07c0f05