saeyslab / nichenetr

NicheNet: predict active ligand-target links between interacting cells
490 stars 117 forks source link

Query: AUPR #251

Closed nfancy closed 6 months ago

nfancy commented 10 months ago

Hi,

Thanks for this nice package. Really helpful for cell-cell communication detection. Do you recommend a range of AUPR_corrected value that can determine moderate to strong ligand activity. In other words, what value can be considered as a notable ligand activity?

Thanks. Nurun

Eisuan commented 7 months ago

Thank you for your interest in NicheNet. We have not defined a specific cutoff of AUPR/AUPR corrected because these values can show high variability among different datasets.

I would stress the fact that it is advisable to focus on the biological interpretation of the results rather than the quantitative meaning of the absolute AUPR scores. In practice, we would advise inspecting if the top AUPR/AUPR corrected prioritized ligands make sense given the biological context of interest (e.g. prioritization of previously described ligand-receptor interactions).

Bests, Daniele

tkapello commented 2 months ago

Hi @Eisuan,

I wanted to open this issue again as I have a similar question as above. The top 20 ligands of my analysis range AUROC 0,768-0899 but AUPR 0,001454-0,0154. What would be your thoughts as to the relevance of my predicted ligands? It is confusing to me how one should look for prior knowledge when the scope is to discover novel biology. If AUPR is not a metric one should focus, what is the essence of ordering ligands based on that, right?

Eisuan commented 2 months ago

Dear Theodore, My previous answer referred to the interpretation of absolute AUPR values. The range of AUPR values may vary from dataset to dataset, making it difficult to draw conclusions based on their absolute values.

For example, in our vignette, ligands show high AUPR values, such as 0.433. Still, I have run ligand prioritization experiments with datasets where the ground truth is known (the cells were treated with a known ligand), and I obtained good ligand prioritizations in case of low absolute AUPR values (the known ligand was ranked 7th, AUPR: 0.03).

In this case, suppose you did not know the ground truth and would have called these results not meaningful because of the low AUPR scores: you would have lost the possibility of finding the ligand that drives your expression changes, which was just there among the top 10 ranked candidates.

More generally, the scores we use in cell communication inference tools tend to suffer from the same limitations, and that's why it is usually rare to put hard cutoffs on their absolute values. You can find an analogous issue for CellChat: https://github.com/jinworks/CellChat/issues/152

It is confusing to me how one should look for prior knowledge when the scope is to discover novel biology.

You can intend it to be more similar to a form of "positive control" for this analysis.