kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

Scaling pseudotime values #181

Closed pavsol closed 2 years ago

pavsol commented 2 years ago

Hi,

I have a question related to #169

I have four trajectories in my data with different lengths (300, 150, 180 and 190). When I plot the smoother for some genes, I get a plot that may not be easily interpretable by the readers (Fig. 1). So I was thinking about whether it makes sense to scale pseudotime values to the values from 0 to 1 for each lineage separately and plot the data like this (Fig. 2). I am not sure whether it does not lead to some misinterpretation of pseudotime values because as you can see, for example, the expression peak for lineages 1 and 3 moves which implies some expression delay in lineage 3.

And a minor question related to the attached figures: I used ggplot geom_smooth for plotting the smoother using "gam" method (n=100). The resulting smoother a bit differ from the one obtained by plotSmoothers(). Is there a better way how to smooth it?

Thank you.

Fig. 1 fig1

Fig. 2 fig2

kstreet13 commented 2 years ago

Hi @pavsol,

So, as I said in the issue you linked to, I think the interpretation of pseudotime as "transcriptional distance" is a good one. In that sense, the differing lengths of the lineages is meaningful, so I would generally not advise rescaling them. More importantly, though, it can create some troubling inconsistencies early on in the shared portion of the lineages. If a short lineage gets elongated and a long lineage gets compressed, then the shared portion (before they split off) ends up misaligned. In this way, cells of the same type end up artificially spread out (you can kind of see this in your plots, the pink and blue peak in the first plot is a little wider and less clear in the second).

And I'm not much of a ggplot expert, but it's my understanding that geom_smooth fits its own (normal-based) loess. This is not appropriate for count data and even with a log transformation, it's not a great model. That's why fitGAM uses a negative binomial model (NB-GAM), which we generally believe is more appropriate (and that's what gets shown by plotSmoothers).

Hope this helps! Kelly

pavsol commented 2 years ago

Hi Kelly, thank you for the explanation, it seems clear now.

Best, Pavel