saeyslab / multinichenetr

MultiNicheNet: a flexible framework for differential cell-cell communication analysis from multi-sample multi-condition single-cell transcriptomics data
GNU General Public License v3.0
117 stars 14 forks source link

Lack of Robustness to Outliers #85

Closed DarioS closed 1 month ago

DarioS commented 2 months ago

Some results appear to be driven by a single sample or perhaps two. Could the prioritisation be made robust to such cases? image

browaeysrobin commented 2 months ago

Hi @DarioS

1) Can you give more information about which vignette and parameters for this analysis? 2) If this shows the data for all samples, am I correct if I interpret the following: for no sample in the first condition, sender and receiver are sufficiently present? (given the small dots)

DarioS commented 2 months ago

Based on the Basic Analysis vignette, I used the code (and can share the CCI variable via e-mail, if useful to you)

contrasts_oi = c("'Yes-No','No-Yes'")
contrast_tbl = tibble(contrast = c("Yes-No","No-Yes"), group = c("Yes","No"))
priorityTable <- get_top_n_lr_pairs(CCI$prioritization_tables, top_n = 50, rank_per_group = FALSE)
CCI <- multi_nichenet_analysis(allHuman, "cellType", "samples", characteristic,
    lr_network = lr_network, ligand_target_matrix = ligand_target_matrix,
    batches = NA, covariates = NA, contrasts_oi = contrasts_oi,
    contrast_tbl  = contrast_tbl)
group_oi <- "Yes"
subsetGroup <- priorityTable %>% filter(group == group_oi)
make_sample_lr_prod_activity_plots_Omnipath(CCI$prioritization_tables, subsetGroup %>% inner_join(lr_network_all))

Yes, all of the samples are shown. It seems that having all samples in one group near-zero leads to bogus significance if only one sample in the other group has high sender and receiver presence. So, I am wondering about robust statistics.

browaeysrobin commented 2 months ago

Hi @DarioS

In the regular MultiNicheNet workflow, interactions will not be kept between cell types if one of the members is not sufficiently present in one of the studied conditions. Therefore, I don't understand why your plot shows an interaction between two cell types insufficiently present in the first condition.

Therefore I suspect something went wrong. Based on the code you provided, is it possible that the following happened: Because you call priorityTable before CCI <- multi_nichenet_analysis, you actually obtain a priorityTable from a previous CCI object. Then running make_sample_lr_prod_activity_plots_Omnipath(CCI$prioritization_tables, subsetGroup %>% inner_join(lr_network_all)) makes a plot using the information from the latest CCI object, but shows the interactions of the priorityTable, which could come from an other object. As a result: your interaction between mCAF2 and vCAF may have been present in the old object, and is thus visualized with the data from the new object.

Do you get the same weird result if you do this:

contrasts_oi = c("'Yes-No','No-Yes'")
contrast_tbl = tibble(contrast = c("Yes-No","No-Yes"), group = c("Yes","No"))
CCI <- multi_nichenet_analysis(allHuman, "cellType", "samples", characteristic,
    lr_network = lr_network, ligand_target_matrix = ligand_target_matrix,
    batches = NA, covariates = NA, contrasts_oi = contrasts_oi,
    contrast_tbl  = contrast_tbl)
priorityTable <- get_top_n_lr_pairs(CCI$prioritization_tables, top_n = 50, rank_per_group = FALSE)
group_oi <- "Yes"
subsetGroup <- priorityTable %>% filter(group == group_oi)
make_sample_lr_prod_activity_plots_Omnipath(CCI$prioritization_tables, subsetGroup %>% inner_join(lr_network_all))

My guess would be that the pattern you see in the provided plot, will not appear now.

Note about the robustness of the prioritization: one prioritization criterion considers the fraction of samples in a group where both ligand and receptor are sufficiently expressed. This criterion makes the output more robust to outlier samples. It is still possible though that some outlier interactions are still in the top_n if they score much higher on other criteria.

DarioS commented 2 months ago

Sorry, I made a mistake when copying lines of code from a loop into GitHub Issues. Can you reproduce it now?

library(nichenetr)
library(multinichenetr)
library(tidyverse)

 lr_network_all = 
    readRDS(url(
      "https://zenodo.org/record/10229222/files/lr_network_human_allInfo_30112033.rds"
      )) %>% 
    mutate(
      ligand = convert_alias_to_symbols(ligand, organism = "human"), 
      receptor = convert_alias_to_symbols(receptor, organism = "human"))

load(url("https://www.maths.usyd.edu.au/u/dario/CCI.RData")) # 135 MB.

priorityTable <- get_top_n_lr_pairs(CCI$prioritization_tables, top_n = 50, rank_per_group = FALSE)
group_oi <- "Yes"
subsetGroup <- priorityTable %>% filter(group == group_oi)
make_sample_lr_prod_activity_plots_Omnipath(CCI$prioritization_tables, 
                                            subsetGroup %>% inner_join(lr_network_all)) # See bottom row of plot.
browaeysrobin commented 2 months ago

Thank you @DarioS, that was helpful and gave us some insights in your finding.

1) CCI$prioritization_tables$group_prioritization_tbl shows the values for the used prioritization criteria, including fraction_expressing_ligand_receptor: this criterion has a low score for the interaction you pointed out, as should be, but other criteria have high scores, making this interaction still quite highly prioritized.

2) DE analysis is performed per cell type, therefore the interaction is kept in the analysis: both the sender and receiver are present in 2 samples in condition 1, however not in the same samples. This is not ideal. We have never encountered this before, but will think about a solution. Therefore we will keep this issue open and flag it as an enhancement.

DarioS commented 2 months ago

Thank you for considering this in a future update.

browaeysrobin commented 1 month ago

Hi @DarioS

We updated the basic vignette to demonstrate how to filter the final interactions to only show those with the most robust expression patterns (https://github.com/saeyslab/multinichenetr/blob/main/vignettes/basic_analysis_steps_MISC.knit.md#filter-interactions-based-on-specific-prioritization-criteria).

The main idea is that you can use the information in the prioritization table CCI$prioritization_tables$group_prioritization_tbl to only keep interactions with a value of fraction_expressing_ligand_receptor higher than a certain threshold (1 to only keep interactions that are highly expressed in all samples of the condition of interest, 0.8 to keep interactions expressed in 80% of interactions, etc...)