statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
236 stars 27 forks source link

Tradeseq with custom Gene x Pseudotime matrices #214

Closed jblich870 closed 9 months ago

jblich870 commented 1 year ago

Hi,

I have a single cell multiomics (RNA + ATAC seq) dataset, and have used another programme (ArchR) to generate pseudotime trajectories. I have 2 trajectories of interest - for each I have a matrix with genes in rows and pseudotime (1 to 100) in columns. The values in the matrix reflect expression of the gene at each point along pseudotime (scaled to -1.5 to 1.5). I also have motif x pseudotime matrices for each trajectory.

Is there a way to plug these matrices into Tradeseq? Is there any other information I would need?

I also have pseudotime values for cells along each trajectory (ie distance along pseudotime) - but not cell weights (is this critical?).

I have count matrix (gene expression x cell, or motif deviations x cell), UMAP embedding co-ordinates for each cell, and cluster information and any other metadata for each cell if needed.

I searched but had difficulty finding out how to input trajectory data that wasn't generated in either Slingshot or Monocle. Would be grateful for any help! Best wishes, J

jblich870 commented 1 year ago

Just to add: the trajectory function implemented within ArchR has been adjusted so that it now provides 1) each cell's position along a given trajectory, and 2) a distance value for each cell to the trajectory. It should therefore be possible to calculate cell weights using these distance values.

koenvandenberge commented 1 year ago

Hi @jpblich

It's certainly possible to use custom pseudotime and cell-level weights matrices as input. Please see the pseudotime and cellWeights argument in fitGAM. This is also done in our tradeSeq vignette which you can find on the Bioconductor website of our package.

You may want to think about the parametric assumption that you will be making on your response variable. For example, the motif data are not counts and you may consider a Gaussian or other distribution to model these. This is possible in fitGAM using the family argument.

Hope this answers your question.

jblich870 commented 1 year ago

Hi,

Many thanks for the reply, that's really helpful.

Please could you advise on the best way to calculate cell weights with the distance values output by ArchR? The distances are "Euclidean distance of each cell to the nearest point along the manifold" (from: https://www.archrproject.com/bookdown/trajectory-analysis-with-archr.html).

I thought perhaps to assign a weight between 0 and 1 based on the relative distance to each trajectory (e.g. if distance of cell to traj A is 1 and to traj B is 4, to assign 0.25 for traj A and 0.75 for traj B). Would that work for Tradeseq?

Many thanks, J

jblich870 commented 1 year ago

Answered here: https://github.com/kstreet13/slingshot/issues/205

saum-kmr commented 9 months ago

Hello,

I would like to follow up on this discussion. My data is multiome (similar to the one in this post). I used Signac for my snATAC-seq analysis and used ChromVar to get motif values for each cell. Since its motif deviations, the values are both positive and negative. I have the pseudotime and cellweights values computed already using slingshot. But when I use fitGAM, I get the error Error in .checks(pseudotime, cellWeights, U, counts, conditions) : All values of the count matrix should be non-negative . As per suggestion above i changed the family to 'gaussian'/'Gaussian', but I still get the same error. I looked up for possible correct family name, but couldn't find an alternate option other than 'nb'. Please suggest what could be going wrong?

Best Regards, Saumya

koenvandenberge commented 9 months ago

Hi @saum-kmr,

Thank you for reporting. This was an oversight on our end and was a check that all values should be non-negative when modeling counts using a count distribution. When setting family="gaussian", this should now be fixed (i.e., negative values are allowed) when installing the latest version from GitHub.

saum-kmr commented 3 months ago

Hello @koenvandenberge ,

Thank you for the update! It took me a while to try this again. I now get a new error Error in[[<-(tmp, name, value = list(X = c(1, 1, 1, 1, 1, 1, 1, : 6253 elements in value to replace 16301 elements , upon using my chromvar matrix. It works completely fine with the transcriptome, the cells are the same and also the cellweights and pseudotime, all calculated using slingshot. The only difference is giving chromvar "data" matrix as in input. Could you please suggest what might be going wrong here?

Best Regards, Saumya