saezlab / liana

LIANA: a LIgand-receptor ANalysis frAmework
https://saezlab.github.io/liana/
GNU General Public License v3.0
169 stars 30 forks source link

Specificity and magnitude columns across methods. #47

Closed enblacar closed 2 years ago

enblacar commented 2 years ago

Hi Liana developers,

First of all, thanks for providing such an amazing package! The fact that one can run different Ligand-Receptor on the go and retrieve everything nicely packed in a long-format tibble is priceless!

I would like to make use of the package for custom visualization purposes, and I am wondering which columns from the different experiments are the ones that point to the specificity (size of the dots, how statistically significant the interaction is) and the magnitude (color of the dots, how strong the interaction is). After going through the outputs, this is what I could gather:

 # Each method have a different column name.
  specificity <- list("cellphonedb" = "pvalue",
                      "natmi" = "edge_specificity",
                      "logfc" = "none",
                      "sca" = "global_mean",
                      "connectome" = "none",
                      "aggregated" = "aggregate_rank")

  magnitude <- list("cellphonedb" = "lr.mean",
                    "natmi" = "prod_weight",
                    "logfc" = "logfc_comb",
                    "sca" = "LRscore",
                    "connectome" = "weight_sc",
                    "aggregated" = "mean_rank")

Where "none" means that I could not find a suitable column for it and "aggregated" means from the output of liana::liana_aggregate(). Could I get some feedback whether these are the right columns and what to include (if any) in the missing ones?

Also, I noticed that the output of liana::liana_aggregate() removes the ligand and receptor columns and maintains the ligand.complex and receptor.complex ones. Is this intentional and, if one wants to generalize plot generation from liana's output, would it be more appropriate to use the complex columns even for single-method experiments?

Many thanks in advance for the feedback!

dbdimitrov commented 2 years ago

Hi Enrique,

Thanks for using LIANA :)

This is mostly correct, but I made some changes:

 # Each method have a different column name.
  specificity <- list("cellphonedb" = "pvalue",
                      "natmi" = "edge_specificity",
                      "logfc" = "logfc_comb", # mean of 1vsRest LR logFC
                      "sca" = "none", # but by default I aggregate it with the specific ones from the remainder of the tools
                      "connectome" = "weight_sc", # mean of z scores
                      "aggregated" = "aggregate_rank")

  magnitude <- list("cellphonedb" = "lr.mean",
                    "natmi" = "prod_weight", # technically the same for connectome but I don't include it due to it being redundant
                    "logfc" = "none",
                    "sca" = "LRscore",
                    "connectome" = "none",
                    "aggregated" = "mean_rank")

For the ranks, it's a bit different. rank_aggregate is simply a probability distribution saying how highly-ranked a given interaction is when aggregating all score vectors - i.e. it's only indicative of what you have it aggregate. If you wish to obtain a magnitude/housekeeping aggregate, see code below:

require(tidyverse)
require(liana)

# Input
liana_path <- system.file(package = "liana")
seurat_object <- readRDS(file.path(liana_path ,
                                   "testdata",
                                   "input",
                                   "testdata.rds"))
# run liana
liana_res <- liana_wrap(seurat_object)

# Run default aggregation
spec_agg <- liana_res %>%
    liana_aggregate()

# Run housekeeping aggregation /w housekeeping funs
magnitude_agg <- liana_res %>%
    # magnitude scoring fun currently not exported - let me know if you want me to export in next update
    liana_aggregate(.score_mode = liana:::.score_housekeep)

For the aggregation, it's intentional that I use the complex columns alone, as using the ligand/receptor (subunit ones), sometimes would result in redundancies. In other words, for the same complexes different method would sometimes return different subunits (depending on which one has the minimum expression/z-score/etc that the method uses to calculate it's score).

For the "none" columns, I guess you can always replace the missing columns with the proportion of expression. From next update (currently available on the liana_1.0 branch), I will have each of the methods return this column by default. I've also been postponing, implementing the option to have custom columns from liana_pipe returned (e.g. p-values, etc) - let me know if this is important for you - if yes, I can try to also include that.

Hope this helps.

Daniel

enblacar commented 2 years ago

Hi Daniel,

Thanks a lot for the illustrative feedback!

The overall idea I have is to add liana and use it wrapped in one of the functions of my on-development R package, which is meant to streamline common scRNAseq visualizations and produce high-quality figures on the go. For this, it will either require the output of liana::liana_wrap() or the user could also run liana directly and get the end result plot.

Initially, I thought that limiting it to cellphoneDB (which provides significance with p-values and magnitudes as a means of depicting how strong the interaction is) would be good enough, but given how easily other methods can be run in Liana, I also tried to incorporate other methods + the aggregated ranks.

For this, the ideal case would be to end up having a p-value column stating whether the interaction is significant or not alongside another column that depicts how strong this interaction is (in whichever metric the method is using).

However, it is not at all a priority and I will be happy to include it once it is made available in the package!

dbdimitrov commented 2 years ago

Hi Enrique,

The package looks very cool. Overall, if I could suggest anything would be to maybe just stick to the liana_aggregate output. Really then the user can pick how to best approach the results. 1) You can filter by CellPhoneDB p-value or the aggregate_rank probabilities, 2) Rank interactions by specificity /w NATMI's edge weights, 3) Use LRscores for magnitude as they are easy to intepret and comparable across datasets.

By p-values, I meant the DE p-values for the ligand and receptors separately, otherwise for the interaction it would need to be the CellPhoneDB p-value. :)

enblacar commented 2 years ago

Hi Daniel,

Thanks for the feedback! I will work on it based on your suggestions.

Best, Enrique

dbdimitrov commented 2 years ago

Hey @enblacar,

Check out the rank_method function :)

enblacar commented 2 years ago

Hi @dbdimitrov

Thanks a lot for implementing the function!

In my use case, I ended up expecting a consensus liana object from the user, and then analyzed the data as you suggested in the comments above. rank_mehtod looks really cool! For what I have seen, it generates a ranking based on "specificity" or "magnitude" for the desired individual methods. That will definitely help with the processing of the results for plotting.

From my point of view and without truly knowing how hard would it be to implement, I could envision some additions to this function:

Thanks again for your quick feedback and enhancements on this great package!

Enrique

dbdimitrov commented 2 years ago

Hi @enblacar,

These are all points that I've also thought in the past. I see nothing stopping anyone doing the 1st point, but I'd prefer to keep them separate in LIANA (I think it's already a bit confusing /w all the methods).

Your second point makes a lot of sense - my only cocern is the same as /w the point above. I will indeed try to implement liana_aggregate to do specificity or magnitude in an easier way in the following updates, so I could also just include a "both" option that returns the aggregate of either :)

Your third point makes a lot of sense, I should make a function that at least prints them out in a clear manner :)

Best,

Danie