theislab / cellrank

CellRank: dynamics from multi-view single-cell data
https://cellrank.org
BSD 3-Clause "New" or "Revised" License
351 stars 46 forks source link

heatmap plot: what are the column #1195

Closed mxposed closed 7 months ago

mxposed commented 7 months ago

I apologize if I missed this in the tutorials, but I don't think it's in the API docs

What are the columns here?

I pass the object with ~27k cells, but the heatmap that is plotted has 200 columns. Are those 200 sampled cells? How are they sampled? Or are these all cells aggregated into 200 pseudo-cells with expression smoothed? Then how is metadata colors computed for this? The code is quite complex around there, and I could find the code that is responsible for these things.

My calls are here:

model = cellrank.models.GAM(myeloid, n_knots=6)
myeloid_raw = myeloid.raw.to_adata()
myeloid_raw.shape
# (26772, 25538)
qqq = cellrank.pl.heatmap(
    myeloid_raw,
    model=model,
    lineages='Cluster 4',
    cluster_key=['cell_type', 'cell_type_state'],
    show_fate_probabilities=True,
    genes=cluster_4_drivers.head(80).index,
    time_key='dpt_pseudotime',
    show_all_genes=True,
    n_convolve=1,
    return_figure=True,
)
qqq[0][0].data.shape
# (30, 200)
WeilerP commented 7 months ago

@mxposed, the plot shows smoothed gene expression change over pseudotime, i.e., each entry (j, k) is the value of gene g at pseudotime k.

mxposed commented 2 months ago

Finally got to look into the code properly: the heatmap column annotation colors are pulled from cells closest to the 200 sampled pseudotime points. Pseudotime is sampled with linspace, gene expression is the predicted expression from the GAM models.

https://github.com/theislab/cellrank/blob/721c59fe3bbbad41450a9b0fd5f34eebe683c08b/src/cellrank/pl/_heatmap.py#L380

https://github.com/theislab/cellrank/blob/721c59fe3bbbad41450a9b0fd5f34eebe683c08b/src/cellrank/pl/_heatmap.py#L176