Closed LTLA closed 4 years ago
Dear Aaron, Thanks a lot for your interest in TSCAN! I would very much like to see it appearing in your book. It's really a good suggestion to allow arbitrary clustering. Actually I have implemented it some time ago for Github version (https://github.com/zji90/TSCAN/blob/master/R/exprmclust.R). There is now a 'cluster' option that allows users to supply arbitrary clustering. However, I haven't updated the Bioconductor version for quite a while. I will consider doing that in the near future.
Thanks @zji90; I look forward to seeing that functionality in the Bioconductor version. (I should add that, to me, the Bioconductor version is main version of your package). If I may, I will also mention the other requests/suggestions that I alluded to in my previous post.
Is there a function to draw lines between cluster centroids in an arbitrary embedding? It would be nice to be able to see the TSCAN-determined connections between clusters on, say, a t-SNE plot. I already wrote some code for this to use in the book, but it seems it would be generally useful. For reference, this is what I did for the book (mst
is the graph object containing the minimum spanning tree, coords
is the matrix of cluster centroids, and sce.nest
is a SingleCellExperiment
):
# TODO: stuff this into a function somewhere.
pairs <- Matrix::which(mst[] > 0, arr.ind=TRUE)
coords <- reducedDim(by.cluster, "TSNE")
group <- rep(seq_len(nrow(pairs)), 2)
stuff <- data.frame(rbind(coords[pairs[,1],], coords[pairs[,2],]), group)
plotTSNE(sce.nest, colour_by="cluster") +
geom_line(data=stuff, mapping=aes(x=X1, y=X2, group=group))
Have you considered a more refined pseudo-time calculation beyond consecutive integers from 1 to the number of cells? It seems like you are already mapping cells onto the closest edge of the MST; why not report the coordinate of each cell along the edge? This would be more helpful for downstream analyses.
On a related note, multiple paths through the MST (from some root node) could be handled with multiple pseudo-times; this seems like it would be more useful than a single vector of pseudo-times, which doesn't really give much insight into how branches work. This is, for example, how slingshot handles branching events.
Perhaps consider easing up on the set.seed()
calls you have scattered throughout your code. (See here for a discussion.)
To continue this discussion: this section of the book attempts to discuss how to use TSCAN for trajectory analysis. For the time being, I have effectively reimplemented the algorithm to work with the SingleCellExperiment
data structures, but it would be much nicer to call TSCAN functions directly.
I'm more than willing to make a PR to add my desired functionality to TSCAN, if that would be of interest. In particular, you may find the projected pseudotimes in Figure 17.4 to be particularly useful.
I will re-iterate that these improvements would be very welcome. Slingshot is excellent, but is extremely slow for "large" (approaching 1e6) datasets. As the centroid/cluster <-> MST approach is far more scalable TSCAN could be the default choice for big scRNA trajectory analysis.
@zji90 I am willing to assume responsibility for maintenance and further development of this package, if you like.
@davemcg In the meantime, I wrote my own implementation of the TSCAN approach and dumped it in scran
as ?createClusterMST
. Only in BioC-devel right now, I'm afraid; I was hoping to move it here at some point.
Thanks @LTLA I've installed BioC-devel scran
and I'll give it a go. Happy with slingshot but it is hard to tweak when it literally takes a week to run!
Hi Aaron,
Thanks a lot for your reply and really sorry for not responding to your last comment for a long time. These are all excellent suggestions and worth being incorporated in the software package.
I really appreciate your offer of maintaining and developing the package in the future and I am happy to work with you on this package. I believe your expertise will definitely help the improvement of the package a lot.
Since you mentioned you have already written your own implementation of TSCAN, I wonder what you think would be the best approach to move forward?
Thank you again and looking forward to your reply.
Best, Jason
-- Zhicheng Ji, PhD. Assistant Professor Department of Biostatistics and Bioinformatics Duke University School of Medicine 2424 Erwin Road, Suite 1102 Hock Plaza Box 2721 Durham, NC 27710, USA
On Wed, Sep 2, 2020 at 11:44 AM Aaron Lun notifications@github.com wrote:
@zji90 https://github.com/zji90 I am willing to assume responsibility for maintenance and further development of this package, if you like.
@davemcg https://github.com/davemcg In the meantime, I wrote my own implementation of the TSCAN approach and dumped it in scran as ?createClusterMST. Only in BioC-devel right now, I'm afraid; I was hoping to move it here at some point.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zji90/TSCAN/issues/9#issuecomment-685823260, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6RAUSWCV23AUCRFTZUEBTSDZR6XANCNFSM4LAEULUQ .
There are several ways to move forward, depending on how much you are willing to trust me :)
The simplest approach would be for me to put in PRs for each change. This will require you to review each PR, merge them and manage the upload to Bioconductor; more work on your end, but you will have a clear idea of what is changing.
If there's a bit more trust, you might add me as a collaborator to this GitHub repository. Then I could just go ahead and add the functions, do various bits of maintenance and improvements, etc. without requiring much work from you. Of course, this also means that if I were evil, I could just change things that you didn't want to change.
If there's even more trust, you could give me write permissions to TSCAN's Bioconductor git repository, in which case I could also manage the upload of any changes to Bioconductor. So even less work required from you, but also more opportunity for me to be evil, because now any changes would appear in the official Bioconductor version of the package.
So, it's really up to you. I'd like to say that I'm not evil, but that's exactly what an evil person would say... I'll just mention that I use all three models for different packages, depending on who I'm working with and how confident I am that they'll do a good job. I've only had one case where I gave someone too much permission and they did something I wasn't happy with.
Hi Aaron,
Thank you very much for your reply. Of course, I have much trust in you and I have just granted you access to the TSCAN Github repo as a collaborator. As for the Bioconductor repo, I would like to grant you access as well but I don't know how to do that yet. I will google online or you can let me know if you are already familiar with it.
Please let me know if there is anything else I can be helpful with.
Best, Jason
-- Zhicheng Ji, PhD. Assistant Professor Department of Biostatistics and Bioinformatics Duke University School of Medicine 2424 Erwin Road, Suite 1102 Hock Plaza Box 2721 Durham, NC 27710, USA
On Thu, Sep 3, 2020 at 2:16 PM Aaron Lun notifications@github.com wrote:
There are several ways to move forward, depending on how much you are willing to trust me :)
The simplest approach would be for me to put in PRs for each change. This will require you to review each PR, merge them and manage the upload to Bioconductor; more work on your end, but you will have a clear idea of what is changing.
If there's a bit more trust, you might add me as a collaborator to this GitHub repository. Then I could just go ahead and add the functions, do various bits of maintenance and improvements, etc. without requiring much work from you. Of course, this also means that if I were evil, I could just change things that you didn't want to change.
If there's even more trust, you could give me write permissions to TSCAN's Bioconductor git repository, in which case I could also manage the upload of any changes to Bioconductor. So even less work required from you, but also more opportunity for me to be evil, because now any changes would appear in the official Bioconductor version of the package.
So, it's really up to you. I'd like to say that I'm not evil, but that's exactly what an evil person would say... I'll just mention that I use all three models for different packages, depending on who I'm working with and how confident I am that they'll do a good job. I've only had one case where I gave someone too much permission and they did something I wasn't happy with.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zji90/TSCAN/issues/9#issuecomment-686664563, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6RAUVITSWOZSOISWSDW2TSD7MPPANCNFSM4LAEULUQ .
Great, thanks @zji90. I've cloned the repo from Bioconductor and implemented all my changes at https://github.com/LTLA/TSCAN; I've added you as a collaborator as well.
(As a side note: I didn't realize how old this repository actually was. It precedes the SVN-to-Git transition on the BioC servers that happened in several years ago, which means that the Git history in this repo is not compatible with the Git history in Bioconductor's "official" repositories. So, I just decided to work off the Bioconductor version rather than attempting to reconcile the two histories.)
If you check out my latest commits in that repo, several interesting points are:
createClusterMST()
function creates a cluster-based MST, from several possible inputs. I've added the idea of an OMEGA group (called outgroup
here) from slingshot to avoid joining different trajectories.reportEdges()
function reports the edge coordinates for easy plotting.mapCellsToEdges()
function maps cells to the closest edge in the MST. This is a cool function because it works on cells that weren't used to construct the MST, so you can "project" other datasets onto an existing MST.orderCells()
function computes a pseudotime, considering all branched paths through the MST from a starting node. The output format was again inspired by what slingshot also produces.master
is allowed.DESCRIPTION
.As for getting write access to the BioC Git servers: can you write an email to bioc-devel@r-project.org, and just say that you want to give me write access to TSCAN? It would be most straightforward if you did sent this email using the JHU address from your DESCRIPTION
, but if you don't have access to that anymore, that's okay; they just need a way to make sure that it's actually you, so I'm sure we can figure something out.
Any thoughts on giving me access to the BioC repos? I can start the conversation on the Bioc-devel mailing list if it helps.
Hi Aaron,
Sorry for the late reply. I have briefly checked your updated version and it looks great! I really appreciate the time and effort you have put into this.
I have also sent an email to bioc-devel asking them to grant you write access to TSCAN package. I will let you know once I hear back from them.
Thanks!
Best, Jason
-- Zhicheng Ji, PhD. Assistant Professor Department of Biostatistics and Bioinformatics Duke University School of Medicine 2424 Erwin Road, Suite 1102 Hock Plaza Box 2721 Durham, NC 27710, USA
On Mon, Sep 7, 2020 at 2:32 PM Aaron Lun notifications@github.com wrote:
Great, thanks @zji90 https://github.com/zji90. I've cloned the repo from Bioconductor and implemented all my changes at https://github.com/LTLA/TSCAN; I've added you as a collaborator as well.
(As a side note: I didn't realize how old this repository actually was. It precedes the SVN-to-Git transition on the BioC servers that happened in several years ago, which means that the Git history in this repo is not compatible with the Git history in Bioconductor's "official" repositories. So, I just decided to work off the Bioconductor version rather than attempting to reconcile the two histories.)
If you check out my latest commits in that repo, several interesting points are:
- The createClusterMST() function creates a cluster-based MST, from several possible inputs. I've added the idea of an OMEGA group (called outgroup here) from slingshot to avoid joining different trajectories.
- The reportEdges() function reports the edge coordinates for easy plotting.
- The mapCellsToEdges() function maps cells to the closest edge in the MST. This is a cool function because it works on cells that weren't used to construct the MST, so you can "project" other datasets onto an existing MST.
- The orderCells() function computes a pseudotime, considering all branched paths through the MST from a starting node. The output format was again inspired by what slingshot also produces.
- All functions have additional methods to smoothly operate with Bioconductor's standard SE and SCE classes, while also remaining operational for people who just have a numeric matrix (e.g., of PC coordinates).
- I have added a GitHub action to check that everything is working before any merge to master is allowed.
- I took the liberty of cleaning up the DESCRIPTION.
As for getting write access to the BioC Git servers: can you write an email to bioc-devel@r-project.org, and just say that you want to give me write access to TSCAN? It would be most straightforward if you did sent this email using the JHU address from your DESCRIPTION, but if you don't have access to that anymore, that's okay; they just need a way to make sure that it's actually you, so I'm sure we can figure something out.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zji90/TSCAN/issues/9#issuecomment-688467372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6RAUTXNGYEXTV6CN6U5LDSEURMHANCNFSM4LAEULUQ .
Excellent, thanks. I'll do some prodding on my end as well.
Hi Aaron,
I just got response from Biocondcutor that they have already granted you access to TSCAN bioconductor package. Please give it a try.
Thanks!
Best, Jason
-- Zhicheng Ji, PhD. Assistant Professor Department of Biostatistics and Bioinformatics Duke University School of Medicine 2424 Erwin Road, Suite 1102 Hock Plaza Box 2721 Durham, NC 27710, USA
On Thu, Sep 17, 2020 at 4:32 AM Aaron Lun notifications@github.com wrote:
Excellent, thanks. I'll do some prodding on my end as well.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zji90/TSCAN/issues/9#issuecomment-694083452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6RAUVA5UPLPP7HIAQVDP3SGHCQBANCNFSM4LAEULUQ .
Yes, I also poked them as well. I'm running through some final checks now and will push it up shortly.
And it's done! I'll keep an eye on the builds... looking forward to working with you on this!
I'm finally getting around to writing the trajectory chapter in the OSCA book (see here for its current state). Given that TSCAN is one of the Bioconductor packages for trajectory analysis, I was planning to demonstrate its use on some example data. To this end, it would help if
exprmclust()
was able to take an arbitraryclusterid
. For example, one could setclusterid=NULL
by default to achieve the existing behavior, but I could also run the function with my precomputed clusters to ensure that TSCAN is consistent with the rest of my analysis.I have some other requests/suggestions, but this seems like a low-hanging fruit to improve useability. I am also willing to put in a PR if that is of interest.