msraredon / NICHES

Niche Interactions and Cellular Heterogeneity in Extracellular Signaling
https://msraredon.github.io/NICHES/
49 stars 15 forks source link

Normalization, rad.set, scripts #2

Closed mhoang22 closed 2 years ago

mhoang22 commented 2 years ago

Hi, Thank you for the new cool tool! As Im using NICHES, I have several questions (as follow) and hope to get clarification on:

  1. Normalization: I notice that in vignettes/01 NICHES Spatial.Rmd, the UMAP was created with SCTransformed data, yet RunNICHES uses log normalized + imputed (Seurat Normalize +ALRA function) for the analysis. I also notice that tests/test_2.R uses SCT for RunNICHES. So just curious in what scenarios would you recommend using SCT or log normalization for RunNICHES?

  2. rad.set: I also hope to get clarification on RunNICHES rad.set. To be specific, let's say rad.set = 2. Then what's the exact distance Im setting? My guess, is that radius r = Euclidean distance to the nearest (1st) neighbor. And rad.set = 2 means the search distance is 2r = 2 times the time distance to the 1st neighbor. (like in the picture attached) Is that NICHES definition of rad.set, or is it something else?

  3. scripts: I'm also curious if you plan to post the scripts used to generate the figures in the manuscripts? (I looked at the 3 scripts in vignettes and 2 scripts in test, but I dont think I encounter such scripts).

Thank you in advance for your time and help!

Screen Shot 2022-02-15 at 12 57 28 PM
jcyang34 commented 2 years ago

Hi @mhoang22,

Thanks for your interest. The reason RunNICHES uses log normalized data before ALRA is that ALRA needs to operate on a normalized matrix and log normalization is used as default due to the widespread use, as discussed in the ALRA paper, but other types of normalization can also likely be used. For that vignette, the original analysis from Seurat was using SCTransform for visualizing the single-cell data so we kept that convention. That test script was only used internally to test different functions.

The Euclidean distance between 2 cells is computed based on their corresponding spatial X, Y coordinates, and rad.set is the threshold of this distance. rad.set == 2 means only cells that have distance <= 2 can interact. Besides this absolute threshold, we also provide a mutual k nearest neighbor alternative, which is enabled by the parameter k (default is 4). For instance, when k == 4, two cells are connected if they are both one of each other’s 4 nearest neighbors. When k is enabled, the rad.set parameter will be ignored.

mhoang22 commented 2 years ago

Hi, I was wondering

1) is the 'k nearest neighbor' option (in RunNICHES) only applied to SpatialTranscriptomics (ST) data, or can it also be applied on scRNAseq data? 2) if the answer for 1 is knn option can indeed be used for scRNA data, then are

msraredon commented 2 years ago

knn, in this context, can only be applied to ST data. The use here defines physical neighbors to create a local neighborhood, in euclidean space, of communicating spots to be considered around each central spot. It does not consider transcriptional similarity or difference, simply euclidean nearest neighbor distance.

Conversely, a uniform random sampling technique used to choose barcode-barcode pairings in non-spatial scRNA data. Cell systems containing only celltypes known to be in close proximity histologically can be assembled in advance of running NICHES if only histologically proximal celltypes are to be considered.

Does this clarify?

mhoang22 commented 2 years ago

Thank you so much for the detailed reply. yes, that was perfectly clear!

So just to confirm, since 'knn in this context can only be applied to ST data', is it safe to assume that when runNICHES on scRNA data, one usually disables/doesnt mention the 'k' option? (I asked since the vignette for scRNA data doesnt use k for runNICHES. yet 1 of the commands I found elsewhere actually specifies k when running CellToCell (pic attached)).

On a separate note, the CellToCellSpatial function doesn't compute interaction score between 1 cell to all other cells in the dataset/the slide, but ONLY to the cells within its neighborhood, does it? (I thought so since in the manuscript, the Cell-Cell matrix S seems to be confined within matrix E)

Screen Shot 2022-03-22 at 7 41 20 PM Screen Shot 2022-03-22 at 7 58 20 PM
jcyang34 commented 2 years ago

is it safe to assume that when runNICHES on scRNA data, one usually disables/doesnt mention the 'k' option?

Yes. To be more specific, The k parameter is only used in 'CellToCellSpatial', 'CellToNeighborhood', 'NeighborhoodToCell' (i.e. when your data have spatial information). Even if only 'CellToCell' is flagged, that k won't be taken as input.

On a separate note, the CellToCellSpatial function doesn't compute interaction score between 1 cell to all other cells in the dataset/the slide, but ONLY to the cells within its neighborhood, does it?

No. It doesn't. Only the cells within its neighborhood.

mhoang22 commented 2 years ago

That's what I have hoped to confirm. Thank you!