saezlab / liana-py

LIANA+: an all-in-one framework for cell-cell communication
http://liana-py.readthedocs.io/
GNU General Public License v3.0
152 stars 20 forks source link

Guidelines to choose the best parameters for bivariate analysis using Visium HD data #133

Open Rafael-Silva-Oliveira opened 1 month ago

Rafael-Silva-Oliveira commented 1 month ago

Hello again!

I've been trying out the bivariate approach with the Visium HD data, and I've been testing some of the parameters, mainly bandwidth and max_neighbours:

li.ut.spatial_neighbors(
    adata,
    bandwidth=1000,
    cutoff=0.1,
    kernel="gaussian",
    set_diag=False,
    max_neighbours=500,
)
li.mt.bivariate(
    adata,
    layer="lognorm_counts",
    resource_name="consensus",  # NOTE: uses HUMAN gene symbols!
    local_name="cosine",  # Name of the function
    global_name="morans",  # Name global function
    n_perms=75,  # Number of permutations to calculate a p-value
    mask_negatives=False,  # Whether to mask LowLow/NegativeNegative interactions
    add_categories=True,  # Whether to add local categories to the results
    nz_prop=0.01,  # Minimum expr. proportion for ligands/receptors and their subunits
    use_raw=False,
    verbose=True,
)

And the results from the spatial plot look like this:

With max neighbors 500 and bandwidth 1000: output3

Max neighbors 100 and bandwidth 250: image

Now, on both of them they naturally have circular shaped regions due to the settings, but I'd just like to ask some guidelines given the following:

Given that these methods can take a bit to get to the results, is there any other factors I should consider to adjust so that things look a bit more "smooth" like the tutorials seen on LIANA+ documentation?

Would it be a good idea to test for the jaccard index considering the actual labelled categories? Should I choose max_neighbors of 1-2 instead and bandwidth of 50-100?

Thanks once again for the support :)

For reference:

bandwidth

dbdimitrov commented 1 month ago

Hi @Rafael-Silva-Oliveira,

Now, the bandwidth is I assume is being calculated in pixels, i.e. the x,y units stored in adata.obsm['spatial']? And pixels in Visium HD images should correspond to something between 0.5 to 2 microns, I guess. If you can check this then you can calculare how many microns are per pixel, and then a commonly used assumption for distance of diffusion is ~100 microns.

Or alternatively, you could set it to 10 or 20 cells.

Both cases are obviously oversimplifications since diffusion depends on the ligand, and you also have membrane-bound interactions. Also, this is expression and not proteins so, to me, the decision is a bit arbitrary and case-to-case dependent.

Hope this helps :)

Rafael-Silva-Oliveira commented 1 month ago

Hi @Rafael-Silva-Oliveira,

Now, the bandwidth is I assume is being calculated in pixels, i.e. the x,y units stored in adata.obsm['spatial']? And pixels in Visium HD images should correspond to something between 0.5 to 2 microns, I guess. If you can check this then you can calculare how many microns are per pixel, and then a commonly used assumption for distance of diffusion is ~100 microns.

Or alternatively, you could set it to 10 or 20 cells.

Both cases are obviously oversimplifications since diffusion depends on the ligand, and you also have membrane-bound interactions. Also, this is expression and not proteins so, to me, the decision is a bit arbitrary and case-to-case dependent.

Hope this helps :)

Thank you for the swift reply once again!

Indeed, the original Visium HD dataset would be seen as the coordinates of each bin, but given I've processed with Bin2Cell, these coordinates got "aggregated" in some way, so I'll have to confirm that :)

Just by following your suggestions, I got to these plots, which seem to me a bit more of what we'd like to see with this type of data!

I've also changed to the jaccard index instead of cosine, as it might be better for categorical data, but I'll see with cosine too

image

Thanks again!

Rafael-Silva-Oliveira commented 1 month ago

Hello again! I don't think this would require opening a new issue, but whenever I run this part of the tutorial (the decoupleR component of the bivariate analysis using LIANA+):


# Estimate cosine similarity
li.mt.bivariate(
    mdata,
    x_mod="comps",
    y_mod="tf",
    local_name="cosine",
    interactions=interactions,
    mask_negatives=True,
    add_categories=True,
    x_use_raw=False,
    y_use_raw=False,
    nz_prop=0.01,  
    xy_sep="<->",
    x_name="celltype",
    y_name="tf",
)

My terminal is killed (I'm assuming because of memory errors, no other warnings);

I have 1100 interactions, 22 cell types (where I converted from string label to one-hot encoded - Instead of being "proportions", here we have 1 spot = 1 cell, so 1 for the cell type it was predicted as for that given cell and 0 for all the others) and 520k cells

I tried reducing to just the top 5 highly variable TFs, but still crashed

Thanks again :)

dbdimitrov commented 4 weeks ago

Hi @Rafael-Silva-Oliveira,

You could try setting add categories and mask negatives to false. Perhaps, this is causing the issue. If it is, I could have another look as there might be a way to make it work also on a laptop.

Daaniel

Rafael-Silva-Oliveira commented 4 weeks ago

Hi @Rafael-Silva-Oliveira,

You could try setting add categories and mask negatives to false. Perhaps, this is causing the issue. If it is, I could have another look as there might be a way to make it work also on a laptop.

Daaniel

Hey Daniel, I tried that approach and still crashed, I haven't checked the underlying code yet, I can also have a look and see where it might be crashing