kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

Calculating cell weights #205

Closed jblich870 closed 1 year ago

jblich870 commented 1 year ago

Hi,

The package ArchR contains a trajectory function that gives 1) each cell's position along a given trajectory, and 2) a distance value for each cell to the trajectory.

I have 2 trajectories of interest in my dataset (actually one trajectory that branches into 2). Is it possible to use these values to compute cell weights, similar to as calculated in slingshot? I would like to input this into tradeseq. I would be extremely grateful if you have any pointers how to calculate cell weights, or do I need any additional information (the equation in the slingshot paper looks extremely complicated!).

Many thanks, J

kstreet13 commented 1 year ago

Hi @jpblich,

I'm not super familiar with how ArchR does trajectory inference, so I'm not really sure what these values mean, unfortunately. Does every cell have a distance and a pseudotime for both lineages? Or do cells after the branching point only have values for one lineage?

If you just need weights to use with tradeSeq, I think there are a couple potentially reasonable approaches. The first would be to set a (somewhat arbitrary) threshold for where the lineages diverge and assign weights based on that. Everything before the split would get weights of (.5, .5) and everything after would get either (1, 0) or (0, 1). The other approach would be to base the weights on the distances, somehow. I'm just guessing here, but making the weights proportional to the inverse of the squared distances might make sense?

I should point out that you can use Slingshot on this sort of data (there's nothing in it that is specific to RNAseq). So long as there is a reduced dimensional representation and a set of cluster labels, it should work, and then you've got the weights that tradeSeq is expecting.

I am interested in this sort of application, so please do let me know if I've missed something important or if you have any follow-up questions/comments.

Best, Kelly

jblich870 commented 1 year ago

Hi Kelly,

Many thanks for getting back to me and sorry for the late reply.

Apologies, I should say ArchR cannot compute branched trajectories so I actually have 2 trajectories - but they are each generated with a supervised trajectory backbone made up of 3 clusters, and the first 2 clusters are identical. So the trajectories might not completely overlap at the beginning but should be very similar. The distances are "Euclidean distance of each cell to the nearest point along the manifold" (from: https://www.archrproject.com/bookdown/trajectory-analysis-with-archr.html).

But yes, each cell has a distance and a pseudotime for both lineages. I will try your suggestions for assigning cellweights. I thought another option might be that after the branch point, in the case that a cell is closer to one trajectory than the other, to assign a weight between 0 and 1 based on the relative distance to each trajectory (e.g. if distance of cell to traj A is 1 and to traj B is 4, to assign 0.25 for traj A and 0.75 for traj B). How does that sound?

Re applying Slingshot to the data, ArchR does actually have wrapper function that implements Slingshot but it doesn't work brilliantly on my data. The trajectories output by the ArchR trajectory function look quite good though so I'd be happy to keep them. Is there a way I can input the ArchR trajectory co-ordinates into Slingshot, then use Slingshot to generate cellweights?

Many thanks, J

kstreet13 commented 1 year ago

Hi @jpblich,

I think your solution probably makes sense, with a slight caveat: I realized we both made the mistake of saying something like "after the branch point", which is not something that can be objectively defined. So my first suggestion above is not feasible and yours would work, but for all cells rather than just those coming after the bifurcation.

I spent some time adapting the code that handles weights in Slingshot, so if you want to try getting something like the weights that Slingshot would assign, here's a function that might do that:

slingWeights() ``` # D is the matrix of distances (rows = cells, columns = lineages) # tol is the tolerance for determining convergence (average change in weights) slingWeights <- function(D, tol = 1e-6){ W <- D W[,] <- 1 W.old <- W-1 while(mean(abs(W-W.old)) > tol){ W.old <- W ordD <- order(D) W.prob <- W/rowSums(W) WrnkD <- cumsum(W.prob[ordD]) / sum(W.prob) Z <- D Z[ordD] <- WrnkD Z.prime <- 1-Z^2 Z.prime[W==0] <- NA W0 <- W W <- Z.prime / rowMaxs(Z.prime,na.rm = TRUE) #rowMins(D) / D W[is.nan(W)] <- 1 # handle 0/0 W[is.na(W)] <- 0 W[W > 1] <- 1 W[W < 0] <- 0 W[W0==0] <- 0 # add if z < .5 idx <- Z < .5 W[idx] <- 1 #(rowMins(D) / D)[idx] # drop if z > .9 and w < .1 ridx <- rowMaxs(Z, na.rm = TRUE) > .9 & rowMins(W, na.rm = TRUE) < .1 W0 <- W[ridx, ] Z0 <- Z[ridx, ] W0[!is.na(Z0) & Z0 > .9 & W0 < .1] <- 0 W[ridx, ] <- W0 } return(W) } ```

Though I should reiterate that I have no idea how appropriate this is (or isn't), because I'm not sure what ArchR is doing under the hood to calculate the "backbone" of each lineage. If you do end up trying this out, please let me know if it seems to work or not.

Best, Kelly

jblich870 commented 1 year ago

Hi Kelly,

That's really helpful and exactly what I'm after, thanks a lot! Will let you know how it goes.

Best wishes, J