kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

Fit one principle curve or principle_curve() outout as intput to slingshot #171

Closed DennisFeige closed 2 years ago

DennisFeige commented 2 years ago

Hi all,

I would like to fit one principle curve to a subsetof my cells so that all cells of that subset belong to that prinicple curce. So far I did it with the principle_curve function of the princurve package. Can I feed the output of the principle_curve() function somehow into slingshot() to generate pseudotime values for my cells (and define beginning and end of the curve)? Or alternatively, is there a way to fit only one principle curve with the slingshot package directly?

Thanks for your work!

DennisFeige commented 2 years ago

Hi all,

just looked a bit myself into it. Am I correct with the assumption, that a principal curve is fit when I give all my cells as one cluster into slingshot()? If so, are the pseudotime values depending on the resolution of the clustering?

Thanks to everyone thinking about my questions!

kstreet13 commented 2 years ago

Hi @DennisFeige ,

Thanks for the questions! First, yes you are correct that running slingshot with all cells in one cluster (or just without providing any cluster labels) should be the same as fitting a standard principal curve. The drawback of this approach is that it doesn't allow you to set the directionality of the curve. Just like principal components, the directionality of a principal curve is arbitrary, but this also means that you are free to flip the directionality as needed.

The cluster resolution determines the complexity of the trajectory being fit. With more clusters, you are more likely to get complex structures with several branching events. Conversely, with either one or two clusters, you are guaranteed to only get a single (non-branching) lineage.

And if you've already run principal_curve and just want to format it to look like slingshot output, you can manually construct a PseudotimeOrdering object (with n rows and, in this case, one column). The PseudotimeOrdering help page has more information on this.

Hope this helps! Kelly

DennisFeige commented 2 years ago

Hi @kstreet13 ,

Thanks, perfect! I would have a follow up question: Let's say I have the same trajectory for a set of cells but I would try with different clusterings of the cells. Would the pseudotime values generated by slingshot be different, if I have the same curve but with different clustering information? Or in other words, is the clustering information of the cells relevant for the generated pseudotime values when the trajectory is the same?

Thank you for answering all these question. It is a great help!

Best, Dennis

kstreet13 commented 2 years ago

Hi @DennisFeige ,

Generally speaking, the final trajectories are the part that matters the most, so if those look the same, then the initial clustering didn't have much of an impact. The final pseudotime values are entirely based on the curve, not the clusters. The clusters determine the initial guess for the lineage, but then it gets updated iteratively until it converges to something stable. If multiple clusterings lead to the same trajectory, then I'd say that's good evidence that it has found a stable minimum and there shouldn't be any huge differences between the runs.

Best, Kelly

DennisFeige commented 2 years ago

Hi Kelly,

thanks! That is the answer I needed.

Best, Dennis