saezlab / liana-py

LIANA+: an all-in-one framework for cell-cell communication
http://liana-py.readthedocs.io/
GNU General Public License v3.0
156 stars 21 forks source link

Large number of pvals are `0` without specificity #144

Open and-rewsmith opened 5 days ago

and-rewsmith commented 5 days ago

Describe the bug There are a large amount of pvals which look to be zero without increased specificity, despite the fact that some entries have 3-4 significant figures.

I also encountered this behavior in your own documentation: https://liana-py.readthedocs.io/en/latest/notebooks/basic_usage.html

Below this quote in the docs there is a graph with the same issue:

By default, liana will be run inplace and results will be assigned to adata.uns['liana_res']. Note that the high proportion of missing entities here is expected, as we are working on the reduced dimensions data.

To Reproduce See above for repro steps. Link to repro in your own docs.

Screenshots Your documentation showcases this issue.

and-rewsmith commented 5 days ago

@dbdimitrov If I misunderstand something please let me know.

and-rewsmith commented 5 days ago

Oh if they are the negative log of the p values I think this may potentially explain it.

and-rewsmith commented 5 days ago

No the above doesn't make sense because I can see this line in the docs (where I indicated previously) which filters on non log pvals. filter_fun=lambda x: x['cellphone_pvals'] <= 0.05,

My understanding is that the pvals generated in the dataframe are non log, then are converted to log in the visualization. That leaves the fact most pvals are 0 without specificity initially a problem, no?

dbdimitrov commented 3 days ago

Hi @and-rewsmith,

Not sure if I understand the issue and let me know if I don't, but you get a lot of pvals of 0 because of the statistical test that is being performed. Essentially, if you look for differences between very distinct cell types (or pairs of cell types in this case), you will find a lot of them having significantly different interactions (since a lot of the LRs are composed of marker genes). This is an issue that is known with both CellChat and CellPhoneDB's permutation approaches.

See in-depth discussion here: https://github.com/saezlab/liana/issues/39