theislab / ncem

Learning cell communication from spatial graphs of cells
https://ncem.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
102 stars 13 forks source link

Use Tangram mapped visium for ncem #122

Closed jkbenotmane closed 2 years ago

jkbenotmane commented 2 years ago

Question Hello, Thank you for creating ncem. I wanted to ask if the Custom data loader also accepts tangram mapped adata. In specific: what should obsm node_types, proportions consist of ?

AnnaChristina commented 2 years ago

Hi @jkbenotmane, yes we support data that was mapped with Tangram.

To provide you a detailed answer to your questions: are you aiming to apply NCEM to deconvoluted Visium data or imputed whole transcriptome single cell resolved data?

jkbenotmane commented 2 years ago

Hi @AnnaChristina , thank you for coming back to me! I plan to use the the imputed Visium data following mapping as it might produce better results, being less sensitive to image segmentation results. But would be thankful for advice.

AnnaChristina commented 2 years ago

We currently provide a tutorial on how to use NCEM for deconvoluted 10x Visium data here: https://github.com/theislab/ncem_tutorials/blob/main/tutorials/type_coupling_visium.ipynb

This tutorial uses cell2location as deconvolution method on a public dataset as described here: https://github.com/theislab/ncem_benchmarks/blob/main/notebooks/data_preparation/deconvolution/cell2location_human_lymphnode.ipynb

You can replace the deconvolution method with a method of your choice. Crucial step is that is predicts spot abundances or proportions and cell-type specific expression values. Based on what I know from the Tangram API, you could use tg.project_cell_annotations() and tg.project_genes(() but maybe confirm again with the latest documentation.

jkbenotmane commented 2 years ago

Thank you very much ! I did already check the tutorial but my question arose, becaus you specified in the tutorial needed output from from cell2location: node_types & proportions.

The anndata object must contain the following objects in .obsm: node_types, proportions and spatial.

Tangram provides following tg.project_cell_annotation() & tg.project_genes() following matrix:

Index|CD4 TEM | Mono | TAM-MG | TAM-BDM | MES-like | CD8 Naive | CD14 Mono | OPC | AC-like | OPC-like | ... | Plasma B | Astrocyte | CD8 TCM | B cell | Neuron | Treg | CD4 CTL | NK_CD56bright | HSPC | ASDC -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- BC1 | 0.251665 | 0.002215 | 1.094364 | 1.365114 | 1.773969 | 0.000245 | 0.236909 | 0.000071 | 1.276275 | 0.615443 | ... | 0.000009 | BC2 | 0.000019 | 0.000032 | 0.000017 | 0.000017 | 0.123033 | 0.000008 | 0.000047 | 0.000044 | 0.000055 BC3 | 0.000611 | 1.369145 | 0.044852 | 2.202428 | 0.043156 | 0.000264 | 1.060199 | 0.000032 | 0.179358 | 0.001250 | ... | 0.000011 | BC4 | 0.000019 | 0.000014 | 0.000014 | 0.000023 | 0.000028 | 0.000011 | 0.000038 | 0.000018 | 0.000074 BC5 | 0.308717 | 0.000857 | 1.202610 | 0.754995 | 0.390998 | 0.010988 | 0.180665 | 0.016620 | 2.129005 | 0.519676 | ... | 0.000027 | BC6 | 0.015983 | 0.047321 | 0.000014 | 0.020850 | 0.000053 | 0.000003 | 0.000101 | 0.000023 | 0.000042

Am I correct with assuming nodetypes being columns/celltypes and proportions being the values for each cell ?

AnnaChristina commented 2 years ago

proportions and node_types are generated here: https://github.com/theislab/ncem_benchmarks/blob/main/notebooks/data_preparation/deconvolution/cell2location_human_lymphnode.ipynb

Those are objects of a "pseudo-single-cell" object which is created in the notebook linked above. You can just link proportions to the abundances learned by Tangram.

The cell specific expression must have the followin dimensions: number of spots x number of genes x number of cell types in the reference. so a matrix similar to the one obtained here: https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Estimate-cell-type-specific-expression-of-every-gene-in-the-spatial-data-(needed-for-NCEM)

jkbenotmane commented 2 years ago

Okay thank you very much that clarified a lot!

Last question: spatial poximity of the inferred single cells is then calculated from .obsm['spatial'], is there a convention to be kept for the naming of the "new" single cell spatial Barcodes?

AnnaChristina commented 2 years ago

I would still call it adata.obsm['spatial'] as the ncem code will search for spatial coordinates in this object. I save the "pseudo-single-cell" object separatly with a name indicating the deconvolution parameters applied to the original data.

Additionally, we are currently working on an update of th ncem API that will make it easier to run these analysis steps for deconvoluted Visium. I can link you to the release as soon as we released the updated tutorials.

jkbenotmane commented 2 years ago

Thank you @AnnaChristina ! I am sorry for me being imprecise, I meant the Barcode names. Eg tangram modifies after deconvolution the spatial barcode on the left to the one on the right.

AAATGGCATGTCTTGT-1 -> AAATGGCATGTCTTGT-1_4

I just wondered if this will interfere with ncem calling spatial coordinates.

But thank you very much for your help, and I sure will check on future releases and mods of ncem !

AnnaChristina commented 2 years ago

ncem does not require barcode information. As far as I know, Tangram is adjusting the barcode to reflect the inferred deconvoluted cells. Each inferred cell in the barcode, i.e. AAATGGCATGTCTTGT-1_4 should still have the same coordinate as the spot itself, so AAATGGCATGTCTTGT-1. With current deconvolution methods one cannout (yet) infer the exact position of single cells in spots, so the ncem model for deconvoluted spot transcriptomics assumes that each spot describes a niche and cells interact within this niche/spot.

jkbenotmane commented 2 years ago

Alright, understood thank you very much!