kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

Slingshot pseudotime with CellRank #193

Closed sneddonucsf closed 2 years ago

sneddonucsf commented 2 years ago

Hello,

Thank you for the fantastic pseudotime package!

I would like to use CellRank's Pseudotime Kernel (https://cellrank.readthedocs.io/en/stable/beyond_rna_velocity.html) with my Slingshot pseudotime results, as I've found that Slingshot gives me the best results over others like DPT, Palantir, etc.

I believe that the input for CellRank should be a pseudotime value associated with each cell. I know that we can pull out these values with slingPseudotime(), but this gives values for each curve. Would you recommend taking the highest value per cell to use as input, or is there another part of the Slingshot results I should use?

Thank you!

kstreet13 commented 2 years ago

Hi @sneddonucsf,

This is a very good question and something I've been meaning to add to the package! My recommendation is to use the weighted average of the different pseudotime values, weighted by their assignment weights (slingCurveWeights).

Here's a simple function that should calculate that vector for any time of Slingshot output (SCE, PseudotimeOrdering, or SlingshotDataSet):

slingAvgPseudotime <- function(x){
    rowSums(slingPseudotime(x, na=FALSE) *
                slingCurveWeights(x, as.probs = TRUE))
}

Best, Kelly

sneddonucsf commented 2 years ago

Thank you @kstreet13! I tried the function and got the error: Error in .local(x, ...) : unused argument (as.probs = TRUE)

If I run: Endocrine.sce.slingshot I get:

class: SingleCellExperiment 
dim: 2000 14231 
metadata(0):
assays(1): logcounts
rownames(2000): SST GCG ... CLIC5 FOXP3
rowData names(0):
colnames(14231): lane1_AAACGAAGTGTGACCC_1 lane1_AAAGGATTCGTACACA_1 ... TTTGTTGGTCAGATTC_9
  TTTGTTGGTTATGTCG_9
colData names(15): orig.ident nCount_RNA ... slingPseudotime_3 slingPseudotime_4
reducedDimNames(2): PCA UMAP
spikeNames(0):
altExpNames(0):

and slingCurveWeights(Endocrine.sce_slingshot) gives me:

                           curve1    curve2    curve3    curve4
lane1_AAACGAAGTGTGACCC_1 0.0000000 1.0000000 0.0000000 0.0000000
lane1_AAAGGATTCGTACACA_1 1.0000000 0.0000000 0.0000000 0.0000000
lane1_AAAGGGCCAGAACATA_1 0.0000000 1.0000000 0.0000000 0.0000000
lane1_AAAGGGCGTTGTTGTG_1 1.0000000 0.0000000 0.0000000 0.0000000
lane1_AAAGGGCTCAGGTAAA_1 1.0000000 0.0000000 0.0000000 0.0000000
lane1_AAAGGTAAGCGAGTAC_1 0.0000000 1.0000000 0.0000000 0.0000000

but slingCurveWeights(Endocrine.sce_slingshot, as.probs = TRUE) gives me the above error.

kstreet13 commented 2 years ago

Hm, that's strange. Just for my own curiosity, which version of the slingshot package do you have?

Anyway, you should be able to get around it with this:

slingAvgPseudotime <- function(x){
    wts <- slingCurveWeights(x)
    wts <- wts / rowSums(wts)
    return(rowSums(slingPseudotime(x, na=FALSE) * wts))
}
sneddonucsf commented 2 years ago

Hi @kstreet13 I'm currently using version 1.4.0... I haven't worked up the courage to update R to version 4 yet for Slingshot 2.4.0!

I tried your second code as test = data.frame(slingAvgPseudotime(Endocrine.sce_slingshot)) and got:

                        slingAvgPseudotime.Endocrine.sce_slingshot.
lane1_AAACGAAGTGTGACCC_1                                          NA
lane1_AAAGGATTCGTACACA_1                                          NA
lane1_AAAGGGCCAGAACATA_1                                          NA

This occurs without the data.frame part. Not sure why I would be getting NA's... I went through each step of the function: wts <- slingCurveWeights(Endocrine.sce_slingshot):

                         curve1    curve2 curve3    curve4
lane1_AAACGAAGTGTGACCC_1      0 1.0000000      0 0.0000000
lane1_AAAGGATTCGTACACA_1      1 0.0000000      0 0.0000000
lane1_AAAGGGCCAGAACATA_1      0 1.0000000      0 0.0000000
lane1_AAAGGGCGTTGTTGTG_1      1 0.0000000      0 0.0000000
lane1_AAAGGGCTCAGGTAAA_1      1 0.0000000      0 0.0000000
lane1_AAAGGTAAGCGAGTAC_1      0 1.0000000      0 0.0000000
lane1_AAAGTGAGTCACCCTT_1      1 0.0000000      0 0.0000000
lane1_AAAGTGAGTTGTCATG_1      1 0.0000000      0 0.0000000
lane1_AACAAGACAGCTATAC_1      1 0.0000000      0 0.0000000
lane1_AACAAGATCATTGCCC_1      1 0.0000000      0 0.0000000
lane1_AACAAGATCTGGACCG_1      1 0.5985513      1 0.6016412
lane1_AACACACGTCATCCCT_1      0 1.0000000      0 0.0000000
lane1_AACAGGGAGGTGCAGT_1      0 0.0000000      1 0.0000000
lane1_AACAGGGAGTCCTGTA_1      1 0.0000000      0 0.0000000
lane1_AACAGGGGTAGGTACG_1      1 0.0000000      0 0.0000000

wts <- wts / rowSums(wts):

                            curve1   curve2    curve3    curve4
lane1_AAACGAAGTGTGACCC_1 0.0000000 1.000000 0.0000000 0.0000000
lane1_AAAGGATTCGTACACA_1 1.0000000 0.000000 0.0000000 0.0000000
lane1_AAAGGGCCAGAACATA_1 0.0000000 1.000000 0.0000000 0.0000000
lane1_AAAGGGCGTTGTTGTG_1 1.0000000 0.000000 0.0000000 0.0000000
lane1_AAAGGGCTCAGGTAAA_1 1.0000000 0.000000 0.0000000 0.0000000
lane1_AAAGGTAAGCGAGTAC_1 0.0000000 1.000000 0.0000000 0.0000000
lane1_AAAGTGAGTCACCCTT_1 1.0000000 0.000000 0.0000000 0.0000000
lane1_AAAGTGAGTTGTCATG_1 1.0000000 0.000000 0.0000000 0.0000000
lane1_AACAAGACAGCTATAC_1 1.0000000 0.000000 0.0000000 0.0000000
lane1_AACAAGATCATTGCCC_1 1.0000000 0.000000 0.0000000 0.0000000
lane1_AACAAGATCTGGACCG_1 0.3124812 0.187036 0.3124812 0.1880016
lane1_AACACACGTCATCCCT_1 0.0000000 1.000000 0.0000000 0.0000000
lane1_AACAGGGAGGTGCAGT_1 0.0000000 0.000000 1.0000000 0.0000000
lane1_AACAGGGAGTCCTGTA_1 1.0000000 0.000000 0.0000000 0.0000000
lane1_AACAGGGGTAGGTACG_1 1.0000000 0.000000 0.0000000 0.0000000

test2 = data.frame(rowSums(slingPseudotime(Endocrine.sce_slingshot, na=FALSE) * wts))

                         rowSums.slingPseudotime.Endocrine.sce_slingshot..na...FALSE....
lane1_AAACGAAGTGTGACCC_1                                                              NA
lane1_AAAGGATTCGTACACA_1                                                              NA
lane1_AAAGGGCCAGAACATA_1                                                              NA

The values aren't all NA's, but most of them are. Any ideas?

sneddonucsf commented 2 years ago

Ahh, I'm guessing that even though na=FALSE it is still giving me NA values for some reason.

kstreet13 commented 2 years ago

Hm, that must have been a bug with that version. The NA values correspond to weights of 0, though, so they shouldn't actually matter for your purposes because they would have no effect on the weighted average.

Let's try this again:

slingAvgPseudotime <- function(x){
    wts <- slingCurveWeights(x)
    wts <- wts / rowSums(wts)
    pst <- slingPseudotime(x)
    pst[is.na(pst)] <- 0
    return(rowSums(pst * wts))
}
sneddonucsf commented 2 years ago

That works @kstreet13, thank you very much!

                         slingAvgPseudotime.Endocrine.sce_slingshot.
lane1_AAACGAAGTGTGACCC_1                                    12.30376
lane1_AAAGGATTCGTACACA_1                                    14.58182
lane1_AAAGGGCCAGAACATA_1                                    13.80268
kstreet13 commented 2 years ago

Sure thing! I just added slingAvgPseudotime as a helper function to the package, so versions >=2.5.2 will have this available. Thanks for the motivation!