saeyslab / multinichenetr

MultiNicheNet: a flexible framework for differential cell-cell communication analysis from multi-sample multi-condition single-cell transcriptomics data
GNU General Public License v3.0
112 stars 14 forks source link

NAs in "target" and "ligand_target_weighted" columns `ligand_activities_targets_DEgenes$ligand_activities` dataframe #8

Closed browaeysrobin closed 1 year ago

browaeysrobin commented 1 year ago
          Thank you! 

I wanted to make sure this was OK since at Step 3 of the introductory vignette (https://github.com/saeyslab/multinichenetr/blob/main/vignettes/basic_analysis_steps_MISC.md) I am getting a ligand_activities_targets_DEgenes$ligand_activities dataframe with rows with NA values in the "target" and "ligand_target_weighted" columns. Specially if I use adjusted p-value instead of normal p-value, which happens in nearly half of the rows. Is this because of gene names missing from the ligand_target_matrix? I am using the mouse Nichenet v2 networks.

Thank you again.

Originally posted by @SergioRodLla in https://github.com/saeyslab/multinichenetr/issues/7#issuecomment-1621651423

browaeysrobin commented 1 year ago

Hi @SergioRodLla

These NA values indicate that for a given ligand, no target gene was found. Meaning that none of the genes in the geneset of interest (=DE genes) is in the top_n (default 250) potential target genes of that ligand. Using the adjusted p-values as cutoff to define the geneset of interest / DE genes will sometimes lead to a very small nr of genes. As a consequence, the chances of not finding target genes back increases, leading to more rows with NA values.

For inferring target genes and predicting ligand activities it is always recommended to have a sufficient nr of genes in the genesets of interest (some guidelines: > 20 & <2000).