kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
265 stars 43 forks source link

Trajectory over condtions not ending in center #196

Closed DennisFeige closed 1 year ago

DennisFeige commented 2 years ago

Hi @kstreet13,

I already posted here and am now back to the data analysis of the project. “I have a sample where I have a control (gray) and treatments with three different concentrations. Low (light blue), middle (blue) and high concentration (dark blue). Similar to your tutorial ("Trajectory inference across conditions: differential expression and differential progression") I want to fit a curve to see the distribution of pseudotimes over conditions.” I now subset the cluster further to the smallest resolvable cell type. I ran into the problem that the trajectory doesn’t end at the center of the clusters (I set "low" as start and "high" as end, this is also biologically expected) and also makes a curve. This leads to a weird binary distribution of pseudotime values for the low clusters. This is not really biological which I can see from the UMAP. The problem was already mentioned here. I also already tried stretch = 0 andextend =n but the result is not improved.

Do you have a suggestion of how to deal with it? In the old thread ou remcomended integraton to remove batch variance. I tried the CCA from seurat but it also removed my biological effect caused by treatment. Do you think fastMNNwill do a better job?

And on the side, can you do statistical testing with the pseudotime values of the different conditions?

Thanks for any input! Cheers, Dennis

kstreet13 commented 2 years ago

Hi @DennisFeige,

I think my primary recommendation here would be to use PCA (or similar) for fitting the trajectory and only use UMAP for visualization. Slingshot can handle any dimensionality (not just 2-D) embedding, so you can put the top K principal components in and you may get a cleaner result. Then for visualization, as I suggested in the second issue you linked to, you can use embedCurves to approximate the curve in the UMAP space or just color the cells by pseudotime (in that thread, I recommended UMAP as an alternative to tSNE, but they actually share a lot of the same drawbacks).

As for the integration methods, that's a little outside my area of expertise, but I have found good results with fastMNN in multiple datasets. So anecdotally, I would say I think it performs better than CCA, and I would definitely recommend trying it out.

And yes, this is something we do in the condiments package. If you construct a single trajectory on all the cells (across all conditions), then it should be fair to compare them in downstream statistical analyses. However, one could argue that these tests show too little variance, because they treat the pseudotime values as fixed, rather than treating them as an estimated quantity, which they are.

Hope this helps!